Commit Graph

5736 Commits

Author SHA1 Message Date
Uwe Kleine-König
5886269962 fix file specification in comments
Many files include the filename at the beginning, serveral used a wrong one.

Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 08:58:16 +02:00
Alexander E. Patrakov
148e423f90 Remove obsolete fat_cvf help text
The text removed by the following patch refers to functionality that never
worked, to non-existing documentation file, and to mount options marked as
obsolete in the module.

Signed-off-by: Alexander E. Patrakov <patrakov@ums.usu.ru>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 08:58:15 +02:00
Michael Opdenacker
59c51591a0 Fix occurrences of "the the "
Signed-off-by: Michael Opdenacker <michael@free-electrons.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 08:57:56 +02:00
Robert P. J. Day
beb7dd86a1 Fix misspellings collected by members of KJ list.
Fix the misspellings of "propogate", "writting" and (oh, the shame
:-) "kenrel" in the source tree.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 07:14:03 +02:00
WANG Cong
ccf6780dc3 Style fix in fs/select.c
Signed-off-by: WANG Cong  <xiyou.wangcong@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 07:10:02 +02:00
Ronni Nielsen
0f8952c2fa fs/libfs.c: >80 columns line break fix
Signed-off-by: Ronni Nielsen <theronni@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 06:44:57 +02:00
David Rientjes
4b8df8915a smaps: only define clear_refs for CONFIG_MMU
/proc/pid/clear_refs is only defined in the CONFIG_MMU case, so make sure we
don't have any references to clear_refs_smap() in generic procfs code.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 20:41:14 -07:00
Linus Torvalds
7b82dc0e64 Remove suid/sgid bits on [f]truncate()
.. to match what we do on write().  This way, people who write to files
by using [f]truncate + writable mmap have the same semantics as if they
were using the write() family of system calls.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 20:10:00 -07:00
Linus Torvalds
60c9b2746f Merge git://oss.sgi.com:8090/xfs/xfs-2.6
* git://oss.sgi.com:8090/xfs/xfs-2.6:
  [XFS] Add lockdep support for XFS
  [XFS] Fix race in xfs_write() b/w dmapi callout and direct I/O checks.
  [XFS] Get rid of redundant "required" in msg.
  [XFS] Export via a function xfs_buftarg_list for use by kdb/xfsidbg.
  [XFS] Remove unused ilen variable and references.
  [XFS] Fix to prevent the notorious 'NULL files' problem after a crash.
  [XFS] Fix race condition in xfs_write().
  [XFS] Fix uquota and oquota enforcement problems.
  [XFS] propogate return codes from flush routines
  [XFS] Fix quotaon syscall failures for group enforcement requests.
  [XFS] Invalidate quotacheck when mounting without a quota type.
  [XFS] reducing the number of random number functions.
  [XFS] remove more misc. unused args
  [XFS] the "aendp" arg to xfs_dir2_data_freescan is always NULL, remove it.
  [XFS] The last argument "lsn" of xfs_trans_commit() is always called with
2007-05-08 11:59:33 -07:00
Linus Torvalds
02a93208ed Merge branch 'for-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block
* 'for-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block:
  [PATCH] ll_rw_blk: fix missing bounce in blk_rq_map_kern()
  [PATCH] splice: always call into page_cache_readahead()
  [PATCH] splice(): fix interaction with readahead
2007-05-08 11:34:52 -07:00
Linus Torvalds
18062a91d2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  JFS: Fix race waking up jfsIO kernel thread
  JFS: use __set_current_state()
  Copy i_flags to jfs inode flags on write
  JFS: document uid, gid, and umask mount options in jfs.txt
2007-05-08 11:32:30 -07:00
Dmitriy Monakhov
951744fea0 udf: possible null pointer dereference while load_partition
sb_read may return NULL, let's explicitly check it.

Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:22 -07:00
Jan Kara
31170b6ad4 udf: support files larger than 1G
Make UDF work correctly for files larger than 1GB.  As no extent can be
longer than (1<<30)-blocksize bytes, we have to create several extents if a
big hole is being created.  As a side-effect, we now don't discard
preallocated blocks when creating a hole.

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:21 -07:00
Jan Kara
948b9b2c96 udf: add assertions
Add a few assertions into udf_discard_prealloc() to check that the file is
sane (mostly helps debugging further patches ;).

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:21 -07:00
Jan Kara
3bf25cb40d udf: use get_bh()
Make UDF use get_bh() instead of directly accessing b_count and use
brelse() instead of udf_release_data() which does just brelse()...

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:21 -07:00
Jan Kara
ff116fc8d1 UDF: introduce struct extent_position
Introduce a structure extent_position to store a position of an extent and
the corresponding buffer_head in one place.

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:21 -07:00
Jan Kara
60448b1d6d udf: use sector_t and loff_t for file offsets
Use sector_t and loff_t for file offsets in UDF filesystem.  Otherwise an
overflow may occur for long files.  Also make inode_bmap() return offset in
the extent in number of blocks instead of number of bytes - for most
callers this is more convenient.

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:21 -07:00
Peter Zijlstra
277866a0e3 nfs: fix congestion control: use atomic_longs
Change the atomic_t in struct nfs_server to atomic_long_t in anticipation
of machines that can handle 8+TB of (4K) pages under writeback.

However I suspect other things in NFS will start going *bang* by then.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:21 -07:00
Ulrich Drepper
1c710c896e utimensat implementation
Implement utimensat(2) which is an extension to futimesat(2) in that it

a) supports nano-second resolution for the timestamps
b) allows to selectively ignore the atime/mtime value
c) allows to selectively use the current time for either atime or mtime
d) supports changing the atime/mtime of a symlink itself along the lines
   of the BSD lutimes(3) functions

For this change the internally used do_utimes() functions was changed to
accept a timespec time value and an additional flags parameter.

Additionally the sys_utime function was changed to match compat_sys_utime
which already use do_utimes instead of duplicating the work.

Also, the completely missing futimensat() functionality is added.  We have
such a function in glibc but we have to resort to using /proc/self/fd/* which
not everybody likes (chroot etc).

Test application (the syscall number will need per-arch editing):

#include <errno.h>
#include <fcntl.h>
#include <time.h>
#include <sys/time.h>
#include <stddef.h>
#include <syscall.h>

#define __NR_utimensat 280

#define UTIME_NOW       ((1l << 30) - 1l)
#define UTIME_OMIT      ((1l << 30) - 2l)

int
main(void)
{
  int status = 0;

  int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666);
  if (fd == -1)
    error (1, errno, "failed to create test file \"ttt\"");

  struct stat64 st1;
  if (fstat64 (fd, &st1) != 0)
    error (1, errno, "fstat failed");

  struct timespec t[2];
  t[0].tv_sec = 0;
  t[0].tv_nsec = 0;
  t[1].tv_sec = 0;
  t[1].tv_nsec = 0;
  if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
    error (1, errno, "utimensat failed");

  struct stat64 st2;
  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
    {
      puts ("atim not reset to zero");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
    {
      puts ("mtim not reset to zero");
      status = 1;
    }
  if (status != 0)
    goto out;

  t[0] = st1.st_atim;
  t[1].tv_sec = 0;
  t[1].tv_nsec = UTIME_OMIT;
  if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
    error (1, errno, "utimensat failed");

  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
      || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
    {
      puts ("atim not set");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
    {
      puts ("mtim changed from zero");
      status = 1;
    }
  if (status != 0)
    goto out;

  t[0].tv_sec = 0;
  t[0].tv_nsec = UTIME_OMIT;
  t[1] = st1.st_mtim;
  if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
    error (1, errno, "utimensat failed");

  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
      || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
    {
      puts ("mtim changed from original time");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec
      || st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec)
    {
      puts ("mtim not set");
      status = 1;
    }
  if (status != 0)
    goto out;

  sleep (2);

  t[0].tv_sec = 0;
  t[0].tv_nsec = UTIME_NOW;
  t[1].tv_sec = 0;
  t[1].tv_nsec = UTIME_NOW;
  if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
    error (1, errno, "utimensat failed");

  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  struct timeval tv;
  gettimeofday(&tv,NULL);

  if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec
      || st2.st_atim.tv_sec > tv.tv_sec)
    {
      puts ("atim not set to NOW");
      status = 1;
    }
  if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec
      || st2.st_mtim.tv_sec > tv.tv_sec)
    {
      puts ("mtim not set to NOW");
      status = 1;
    }

  if (symlink ("ttt", "tttsym") != 0)
    error (1, errno, "cannot create symlink");

  t[0].tv_sec = 0;
  t[0].tv_nsec = 0;
  t[1].tv_sec = 0;
  t[1].tv_nsec = 0;
  if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0)
    error (1, errno, "utimensat failed");

  if (lstat64 ("tttsym", &st2) != 0)
    error (1, errno, "lstat failed");

  if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
    {
      puts ("symlink atim not reset to zero");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
    {
      puts ("symlink mtim not reset to zero");
      status = 1;
    }
  if (status != 0)
    goto out;

  t[0].tv_sec = 1;
  t[0].tv_nsec = 0;
  t[1].tv_sec = 1;
  t[1].tv_nsec = 0;
  if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0)
    error (1, errno, "utimensat failed");

  if (fstat64 (fd, &st2) != 0)
    error (1, errno, "fstat failed");

  if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0)
    {
      puts ("atim not reset to one");
      status = 1;
    }
  if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0)
    {
      puts ("mtim not reset to one");
      status = 1;
    }

  if (status == 0)
     puts ("all OK");

 out:
  close (fd);
  unlink ("ttt");
  unlink ("tttsym");

  return status;
}

[akpm@linux-foundation.org: add missing i386 syscall table entry]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:18 -07:00
Jeff Layton
1a1c9bb433 inode numbering: change libfs sb creation routines to avoid collisions with their root inodes
This patch makes it so that simple_fill_super and get_sb_pseudo assign their
root inodes to be number 1.  It also fixes up a couple of callers of
simple_fill_super that were passing in files arrays that had an index at
number 1, and adds a warning for any caller that sends in such an array.

It would have been nice to have made it so that it wasn't possible to make
such a collision, but some callers need to be able to control what inode
number their entries get, so I think this is the best that can be done.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:16 -07:00
Jeff Layton
866b04fccb inode numbering: make static counters in new_inode and iunique be 32 bits
The problems are:

- on filesystems w/o permanent inode numbers, i_ino values can be larger
  than 32 bits, which can cause problems for some 32 bit userspace programs on
  a 64 bit kernel.  We can't do anything for filesystems that have actual
  >32-bit inode numbers, but on filesystems that generate i_ino values on the
  fly, we should try to have them fit in 32 bits.  We could trivially fix this
  by making the static counters in new_inode and iunique 32 bits, but...

- many filesystems call new_inode and assume that the i_ino values they are
  given are unique.  They are not guaranteed to be so, since the static
  counter can wrap.  This problem is exacerbated by the fix for #1.

- after allocating a new inode, some filesystems call iunique to try to get
  a unique i_ino value, but they don't actually add their inodes to the
  hashtable, and so they're still not guaranteed to be unique if that counter
  wraps.

This patch set takes the simpler approach of simply using iunique and hashing
the inodes afterward.  Christoph H.  previously mentioned that he thought that
this approach may slow down lookups for filesystems that currently hash their
inodes.

The questions are:

1) how much would this slow down lookups for these filesystems?
2) is it enough to justify adding more infrastructure to avoid it?

What might be best is to start with this approach and then only move to using
IDR or some other scheme if these extra inodes in the hashtable prove to be
problematic.

I've done some cursory testing with this patch and the overhead of hashing and
unhashing the inodes with pipefs is pretty low -- just a few seconds of system
time added on to the creation and destruction of 10 million pipes (very
similar to the overhead that the IDR approach would add).

The hard thing to measure is what effect this has on other filesystems. I'm
open to ways to try and gauge this.

Again, I've only converted pipefs as an example. If this approach is
acceptable then I'll start work on patches to convert other filesystems.

With a pretty-much-worst-case microbenchmark provided by Eric Dumazet
<dada1@cosmosbay.com>:

hashing patch (pipebench):
sys     1m15.329s
sys     1m16.249s
sys     1m17.169s

unpatched (pipebench):
sys     1m9.836s
sys     1m12.541s
sys     1m14.153s

Which works out to 1.05642174294555027017.  So ~5-6% slowdown.

This patch:

When a 32-bit program that was not compiled with large file offsets does a
stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
(correctly) generates an EOVERFLOW error.  We can't do anything about fs's
with larger permanent inode numbers, but when we generate them on the fly, we
ought to try and have them fit within a 32 bit field.

This patch takes the first step toward this by making the static counters in
these two functions be 32 bits.

[jlayton@redhat.com: mention that it's only the case for 32bit, non-LFS stat]
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:16 -07:00
Alexey Kuznetsov
b140f25108 Invalid return value of execve() resulting in oopses
When elf loader fails to map executable (due to memory shortage or because
binary is malformed), it can return 0.  Normally, this is invisible because
process is killed with SIGKILL and it never returns to user space.

But if exec() is called from kernel thread (hotplug, whatever)
consequences are more interesting and vary depending on architecture.

i386.   Nothing especially interesting, execve() just returns
        with "success"  :-)

x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded
        with zeros, ergo... double fault.

ia64.   Similar to i386, but r32...r95 are corrupted. Sometimes it
        oopses due to return to zero PC, sometimes it sees NaT in
        rXX and oopses due to NaT consumption.

Signed-off-by: Alexey Kuznetsov <alexey@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:15 -07:00
Akinobu Mita
0c28f287aa procfs: use simple_read_from_buffer()
Cleanup using simple_read_from_buffer() in procfs.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:14 -07:00
Andreas Schwab
83ae1b79c8 Fix error handling in HDIO_GETGEO compat wrapper
Don't clobber error from sys_ioctl in HDIO_GETGEO compat wrapper.

Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:14 -07:00
Stephen Mollett
c007c06e3c udf: decrement correct link count in udf_rmdir
It appears that a minor thinko occurred in udf_rmdir and the
(already-cleared) link count on the directory that is being removed was
being decremented instead of the link count on its parent directory.  This
gives rise to lots of kernel messages similar to:

UDF-fs warning (device loop1): udf_rmdir: empty directory has nlink != 2 (8)

when removing directory trees.  No other ill effects have been observed but
I guess it could theoretically result in the link count overflowing on a
very long-lived, much modified directory.

Signed-off-by: Stephen Mollett <molletts@yahoo.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:14 -07:00
OGAWA Hirofumi
c483bab099 fat: fix VFAT compat ioctls on 64-bit systems
If you compile and run the below test case in an msdos or vfat directory on
an x86-64 system with -m32 you'll get garbage in the kernel_dirent struct
followed by a SIGSEGV.

The patch fixes this.

Reported and initial fix by Bart Oldeman

#include <sys/types.h>
#include <sys/ioctl.h>
#include <dirent.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
struct kernel_dirent {
         long            d_ino;
         long		d_off;
         unsigned short  d_reclen;
         char            d_name[256]; /* We must not include limits.h! */
};
#define VFAT_IOCTL_READDIR_BOTH  _IOR('r', 1, struct kernel_dirent [2])
#define VFAT_IOCTL_READDIR_SHORT  _IOR('r', 2, struct kernel_dirent [2])

int main(void)
{
         int fd = open(".", O_RDONLY);
         struct kernel_dirent de[2];

         while (1) {
                 int i = ioctl(fd, VFAT_IOCTL_READDIR_BOTH, (long)de);
                 if (i == -1) break;
                 if (de[0].d_reclen == 0) break;
                 printf("SFN: reclen=%2d off=%d ino=%d, %-12s",
 		       de[0].d_reclen, de[0].d_off, de[0].d_ino, de[0].d_name);
 		if (de[1].d_reclen)
 		  printf("\tLFN: reclen=%2d off=%d ino=%d, %s",
 		    de[1].d_reclen, de[1].d_off, de[1].d_ino, de[1].d_name);
 		printf("\n");
         }
         return 0;
}

Signed-off-by: Bart Oldeman <bartoldeman@users.sourceforge.net>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:14 -07:00
Jan Kara
4f99ed67cc ext3: copy i_flags to inode flags on write
Propagate flags such as S_APPEND, S_IMMUTABLE, etc.  from i_flags into
ext2-specific i_flags.  Hence, when someone sets these flags via a different
interface than ioctl, they are stored correctly.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:13 -07:00
OGAWA Hirofumi
28ec039c21 fat: don't use free_clusters for fat32
It seems that the recent Windows changed specification, and it's
undocumented.  Windows doesn't update ->free_clusters correctly.

This patch doesn't use ->free_clusters by default.  (instead, add "usefree"
for forcing to use it)

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Juergen Beisert <juergen127@kreuzholzen.de>
Cc: Andreas Schwab <schwab@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:13 -07:00
Milind Arun Choudhary
5ab2f7e0fd reiserfs: use __set_current_state()
use __set_current_state(TASK_*) instead of current->state = TASK_*, in
fs/reiserfs

Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:13 -07:00
Pavel Emelianov
97f0678467 jbd: check for error returned by kthread_create on creating journal thread
If the thread failed to create the subsequent wait_event will hang forever.

This is likely to happen if kernel hits max_threads limit.

Will be critical for virtualization systems that limit the number of tasks
and kernel memory usage within the container.

(akpm: JBD should be converted fully to the kthread API: kthread_should_stop()
and kthread_stop()).

Cc: <linux-ext4@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:13 -07:00
Miklos Szeredi
ee6f958291 check privileges before setting mount propagation
There's a missing check for CAP_SYS_ADMIN in do_change_type().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:12 -07:00
Jan Kara
28be5abb40 ext3: copy i_flags to inode flags on write
A patch that stores inode flags such as S_IMMUTABLE, S_APPEND, etc.  from
i_flags to EXT3_I(inode)->i_flags when inode is written to disk.  The same
thing is done on GETFLAGS ioctl.

Quota code changes these flags on quota files (to make it harder for
sysadmin to screw himself) and these changes were not correctly propagated
into the filesystem (especially, lsattr did not show them and users were
wondering...).

Propagate flags such as S_APPEND, S_IMMUTABLE, etc.  from i_flags into
ext3-specific i_flags.  Hence, when someone sets these flags via a
different interface than ioctl, they are stored correctly.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:12 -07:00
Pavel Emelianov
b5e618181a Introduce a handy list_first_entry macro
There are many places in the kernel where the construction like

   foo = list_entry(head->next, struct foo_struct, list);

are used.
The code might look more descriptive and neat if using the macro

   list_first_entry(head, type, member) \
             list_entry((head)->next, type, member)

Here is the macro itself and the examples of its usage in the generic code.
 If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:11 -07:00
Eric W. Biederman
1bd0cf1fc7 smbfs: remove unnecessary allow_signal
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:11 -07:00
Jeffrey Layton
3361c7bebb make iunique use a do/while loop rather than its obscure goto loop
A while back, Christoph mentioned that he thought that iunique ought to be
cleaned up to use a more conventional loop construct. This patch does that,
turning the strange goto loop into a do/while.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:10 -07:00
John Johansen
9d0633cfed Remove redundant check from proc_sys_setattr()
notify_change() already calls security_inode_setattr() before
calling iop->setattr.

Alan sayeth

  This is a behaviour change on all of these and limits some behaviour of
  existing established security modules

  When inode_change_ok is called it has side effects.  This includes
  clearing the SGID bit on attribute changes caused by chmod.  If you make
  this change the results of some rulesets may be different before or after
  the change is made.

  I'm not saying the change is wrong but it does change behaviour so that
  needs looking at closely (ditto all other attribute twiddles)

Signed-off-by: Steve Beattie <sbeattie@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: John Johansen <jjohansen@suse.de>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:10 -07:00
John Johansen
1e8123fded Remove redundant check from proc_setattr()
notify_change() already calls security_inode_setattr() before
calling iop->setattr.

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: John Johansen <jjohansen@suse.de>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:10 -07:00
Martin Peschke
09f0892ec7 proc: cleanup: use seq_release_private() where appropriate
We can save some lines of code by using seq_release_private().

Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:09 -07:00
Christoph Hellwig
6272e26679 cleanup compat ioctl handling
Merge all compat ioctl handling into compat_ioctl.c instead of splitting it
over compat.c and compat_ioctl.c.  This also allows to get rid of ioctl32.h

Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks-good-to: Andi Kleen <ak@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:09 -07:00
Philippe De Muyter
19d0e8ce85 partition: add support for sysv68 partitions
Add support for the Motorola sysv68 disk partition (slices in motorola
doc).

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:09 -07:00
Christoph Hellwig
644fd4f5de merge compat_ioctl.h into compat_ioctl.c
Now that there is no arch-specific compat ioctl handling left there is not
point in having a separate copat_ioctl.h, so merge it into compat_ioctl.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:09 -07:00
Milind Arun Choudhary
1525dccbc2 ROUND_UP macro cleanup in fs/smbfs/request.c
ROUND_UP macro cleanup use ALIGN

Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:09 -07:00
Milind Arun Choudhary
022a169244 ROUND_UP macro cleanup in fs/(select|compat|readdir).c
ROUND_UP macro cleanup use,ALIGN or DIV_ROUND_UP where ever appropriate.

Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:09 -07:00
Alexey Dobriyan
7e80d0d0b6 i386: sched.h inclusion from module.h is baack
linux/module.h
  -> linux/elf.h
     -> asm-i386/elf.h
        -> linux/utsname.h
           -> linux/sched.h

Noticeably cut the number of files which are rebuild upon touching sched.h
and cut down pulled junk from every module.h inclusion.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:08 -07:00
Alexey Dobriyan
9d65cb4a17 Fix race between cat /proc/*/wchan and rmmod et al
kallsyms_lookup() can go iterating over modules list unprotected which is OK
for emergency situations (oops), but not OK for regular stuff like
/proc/*/wchan.

Introduce lookup_symbol_name()/lookup_module_symbol_name() which copy symbol
name into caller-supplied buffer or return -ERANGE.  All copying is done with
module_mutex held, so...

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:08 -07:00
Alexey Dobriyan
ffb4512276 Simplify kallsyms_lookup()
Several kallsyms_lookup() pass dummy arguments but only need, say, module's
name.  Make kallsyms_lookup() accept NULLs where possible.

Also, makes picture clearer about what interfaces are needed for all symbol
resolving business.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:08 -07:00
kalash nainwal
98701d1b0f (re)register_binfmt returns with -EBUSY
When a binary format is unregistered and re-registered, register_binfmt
fails with -EBUSY.  The reason is that unregister_binfmt does not set
fmt->next to NULL, and seeing (fmt->next != NULL), register_binfmt fails
with -EBUSY.

One can find his way around by explicitly setting fmt->next to NULL after
unregistering, but that is kind of unclean (one should better be using only
the interfaces, and not the interal members, isn't it?)

Attached one-liner can fix it.

Signed-off-by: Kalash Nainwal <kalash.nainwal@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:08 -07:00
Randy Dunlap
e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Adrian Bunk
e5f00f42f3 make remove_inode_dquot_ref() static
remove_inode_dquot_ref() can now become static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:05 -07:00
Alexey Dobriyan
ca509f69de Protect tty drivers list with tty_mutex
Additions and removal from tty_drivers list were just done as well as
iterating on it for /proc/tty/drivers generation.

testing: modprobe/rmmod loop of simple module which does nothing but
tty_register_driver() vs cat /proc/tty/drivers loop

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
 printing eip:
c01cefa7
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
last sysfs file: devices/pci0000:00/0000:00:1d.7/usb5/5-0:1.0/bInterfaceProtocol
Modules linked in: ohci_hcd af_packet e1000 ehci_hcd uhci_hcd usbcore xfs
CPU:    0
EIP:    0060:[<c01cefa7>]    Not tainted VLI
EFLAGS: 00010297   (2.6.21-rc4-mm1 #4)
EIP is at vsnprintf+0x3a4/0x5fc
eax: 6b6b6b6b   ebx: f6cb50f2   ecx: 6b6b6b6b   edx: fffffffe
esi: c0354700   edi: f6cb6000   ebp: 6b6b6b6b   esp: f31f5e68
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process cat (pid: 31864, ti=f31f4000 task=c1998030 task.ti=f31f4000)
Stack: 00000000 c0103f20 c013003a c0103f20 00000000 f6cb50da 0000000a 00000f0e
       f6cb50f2 00000010 00000014 ffffffff ffffffff 00000007 c0354753 f6cb50f2
       f73e39dc f73e39dc 00000001 c0175416 f31f5ed8 f31f5ed4 0ee00000 f32090bc
Call Trace:
 [<c0103f20>] restore_nocheck+0x12/0x15
 [<c013003a>] mark_held_locks+0x6d/0x86
 [<c0103f20>] restore_nocheck+0x12/0x15
 [<c0175416>] seq_printf+0x2e/0x52
 [<c0192895>] show_tty_range+0x35/0x1f3
 [<c0175416>] seq_printf+0x2e/0x52
 [<c0192add>] show_tty_driver+0x8a/0x1d9
 [<c01758f6>] seq_read+0x70/0x2ba
 [<c0175886>] seq_read+0x0/0x2ba
 [<c018d8e6>] proc_reg_read+0x63/0x9f
 [<c015e764>] vfs_read+0x7d/0xb5
 [<c018d883>] proc_reg_read+0x0/0x9f
 [<c015eab1>] sys_read+0x41/0x6a
 [<c0103e4e>] sysenter_past_esp+0x5f/0x99
 =======================
Code: 00 8b 4d 04 e9 44 ff ff ff 8d 4d 04 89 4c 24 50 8b 6d 00 81 fd ff 0f 00 00 b8 a4 c1 35 c0 0f 46 e8 8b 54 24 2c 89 e9 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 89 c6 8b 44 24 28 89
EIP: [<c01cefa7>] vsnprintf+0x3a4/0x5fc SS:ESP 0068:f31f5e68

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:05 -07:00
Mark Fasheh
ef51c97623 Remove do_sync_file_range()
Remove do_sync_file_range() and convert callers to just use
do_sync_mapping_range().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:04 -07:00
Randy Dunlap
880ebdc516 reiserfs: proc support requires PROC_FS
REISER_FS /proc option needs to depend on PROC_FS.

fs/reiserfs/procfs.c: In function 'show_super':
fs/reiserfs/procfs.c:134: error: 'reiserfs_proc_info_data_t' has no member named 'max_hash_collisions'
fs/reiserfs/procfs.c:134: error: 'reiserfs_proc_info_data_t' has no member named 'breads'
fs/reiserfs/procfs.c:135: error: 'reiserfs_proc_info_data_t' has no member named 'bread_miss'
fs/reiserfs/procfs.c:135: error: 'reiserfs_proc_info_data_t' has no member named 'search_by_key'
fs/reiserfs/procfs.c:136: error: 'reiserfs_proc_info_data_t' has no member named 'search_by_key_fs_changed'
fs/reiserfs/procfs.c:136: error: 'reiserfs_proc_info_data_t' has no member named 'search_by_key_restarted'
fs/reiserfs/procfs.c:137: error: 'reiserfs_proc_info_data_t' has no member named 'insert_item_restarted'
fs/reiserfs/procfs.c:137: error: 'reiserfs_proc_info_data_t' has no member named 'paste_into_item_restarted'
fs/reiserfs/procfs.c:138: error: 'reiserfs_proc_info_data_t' has no member named 'cut_from_item_restarted'
fs/reiserfs/procfs.c:139: error: 'reiserfs_proc_info_data_t' has no member named 'delete_solid_item_restarted'
fs/reiserfs/procfs.c:139: error: 'reiserfs_proc_info_data_t' has no member named 'delete_item_restarted'
fs/reiserfs/procfs.c:140: error: 'reiserfs_proc_info_data_t' has no member named 'leaked_oid'
fs/reiserfs/procfs.c:140: error: 'reiserfs_proc_info_data_t' has no member named 'leaves_removable'
fs/reiserfs/procfs.c: In function 'show_per_level':
fs/reiserfs/procfs.c:184: error: 'reiserfs_proc_info_data_t' has no member named 'balance_at'
fs/reiserfs/procfs.c:185: error: 'reiserfs_proc_info_data_t' has no member named 'sbk_read_at'
fs/reiserfs/procfs.c:186: error: 'reiserfs_proc_info_data_t' has no member named 'sbk_fs_changed'
fs/reiserfs/procfs.c:187: error: 'reiserfs_proc_info_data_t' has no member named 'sbk_restarted'
fs/reiserfs/procfs.c:188: error: 'reiserfs_proc_info_data_t' has no member named 'free_at'
fs/reiserfs/procfs.c:189: error: 'reiserfs_proc_info_data_t' has no member named 'items_at'
fs/reiserfs/procfs.c:190: error: 'reiserfs_proc_info_data_t' has no member named 'can_node_be_removed'
fs/reiserfs/procfs.c:191: error: 'reiserfs_proc_info_data_t' has no member named 'lnum'
fs/reiserfs/procfs.c:192: error: 'reiserfs_proc_info_data_t' has no member named 'rnum'
fs/reiserfs/procfs.c:193: error: 'reiserfs_proc_info_data_t' has no member named 'lbytes'
fs/reiserfs/procfs.c:194: error: 'reiserfs_proc_info_data_t' has no member named 'rbytes'
fs/reiserfs/procfs.c:195: error: 'reiserfs_proc_info_data_t' has no member named 'get_neighbors'
fs/reiserfs/procfs.c:196: error: 'reiserfs_proc_info_data_t' has no member named 'get_neighbors_restart'
fs/reiserfs/procfs.c:197: error: 'reiserfs_proc_info_data_t' has no member named 'need_l_neighbor'
fs/reiserfs/procfs.c:197: error: 'reiserfs_proc_info_data_t' has no member named 'need_r_neighbor'
fs/reiserfs/procfs.c: In function 'show_bitmap':
fs/reiserfs/procfs.c:224: error: 'reiserfs_proc_info_data_t' has no member named 'free_block'
fs/reiserfs/procfs.c:225: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
fs/reiserfs/procfs.c:226: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
fs/reiserfs/procfs.c:227: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
fs/reiserfs/procfs.c:228: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
fs/reiserfs/procfs.c:229: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
fs/reiserfs/procfs.c:230: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
fs/reiserfs/procfs.c:230: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap'
fs/reiserfs/procfs.c: In function 'show_journal':
fs/reiserfs/procfs.c:384: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:385: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:386: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:387: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:388: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:389: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:390: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:391: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:392: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:393: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:394: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:395: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:395: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c:395: error: 'reiserfs_proc_info_data_t' has no member named 'journal'
fs/reiserfs/procfs.c: In function 'reiserfs_proc_info_init':
fs/reiserfs/procfs.c:504: warning: implicit declaration of function '__PINFO'
fs/reiserfs/procfs.c:504: error: request for member 'lock' in something not a structure or union
fs/reiserfs/procfs.c: In function 'reiserfs_proc_info_done':
fs/reiserfs/procfs.c:544: error: request for member 'lock' in something not a structure or union
fs/reiserfs/procfs.c:545: error: request for member 'exiting' in something not a structure or union
fs/reiserfs/procfs.c:546: error: request for member 'lock' in something not a structure or union

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:04 -07:00
Alexey Dobriyan
19c5d45a09 /proc/*/oom_score oops re badness
Eternal quest to make

	while true; do cat /proc/fs/xfs/stat >/dev/null 2>/dev/null; done
	while true; do find /proc -type f 2>/dev/null | xargs cat >/dev/null 2>/dev/null; done
	while true; do modprobe xfs; rmmod xfs; done

work reliably continues and now kernel oopses in the following way:

BUG: unable to handle ... at virtual address 6b6b6b6b
EIP is at badness
process: cat
	proc_oom_score
	proc_info_read
	sys_fstat64
	vfs_read
	proc_info_read
	sys_read

Failing code is prefetch hidden in list_for_each_entry() in badness().
badness() is reachable from two points. One is proc_oom_score, another
is out_of_memory() => select_bad_process() => badness().

Second path grabs tasklist_lock, while first doesn't.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:04 -07:00
Eric Dumazet
c23fbb6bcb VFS: delay the dentry name generation on sockets and pipes
1) Introduces a new method in 'struct dentry_operations'.  This method
   called d_dname() might be called from d_path() to build a pathname for
   special filesystems.  It is called without locks.

   Future patches (if we succeed in having one common dentry for all
   pipes/sockets) may need to change prototype of this method, but we now
   use : char *d_dname(struct dentry *dentry, char *buffer, int buflen);

2) Adds a dynamic_dname() helper function that eases d_dname() implementations

3) Defines d_dname method for sockets : No more sprintf() at socket
   creation.  This is delayed up to the moment someone does an access to
   /proc/pid/fd/...

4) Defines d_dname method for pipes : No more sprintf() at pipe
   creation.  This is delayed up to the moment someone does an access to
   /proc/pid/fd/...

A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
*nice* speedup on my Pentium(M) 1.6 Ghz :

3.090 s instead of 3.450 s

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:03 -07:00
Miklos Szeredi
2793274298 add file position info to proc
Add support for finding out the current file position, open flags and
possibly other info in the future.

These new entries are added:

  /proc/PID/fdinfo/FD
  /proc/PID/task/TID/fdinfo/FD

For each fd the information is provided in the following format:

pos:	1234
flags:	0100002

[bunk@stusta.de: make struct proc_fdinfo_file_operations static]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:03 -07:00
Eric Dumazet
c5141e6d64 procfs: reorder struct pid_dentry to save space on 64bit archs, and constify them
Change the order of fields of struct pid_entry (file fs/proc/base.c) in order
to avoid a hole on 64bit archs.  (8 bytes saved per object)

Also change all pid_entry arrays to be const qualified, to make clear they
must not be modified.

Before (on x86_64) :

# size fs/proc/base.o
   text    data     bss     dec     hex filename
  15549    2192       0   17741    454d fs/proc/base.o

After :

# size fs/proc/base.o
   text    data     bss     dec     hex filename
  17229     176       0   17405    43fd fs/proc/base.o

Thats 336 bytes saved on kernel size on x86_64

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:03 -07:00
Kees Cook
5096add84b proc: maps protection
The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive
information about the memory location and usage of processes.  Issues:

- maps should not be world-readable, especially if programs expect any
  kind of ASLR protection from local attackers.
- maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc
  check the maps when %n is in a *printf call, and a setuid(getuid())
  process wouldn't be able to read its own maps file.  (For reference
  see http://lkml.org/lkml/2006/1/22/150)
- a system-wide toggle is needed to allow prior behavior in the case of
  non-root applications that depend on access to the maps contents.

This change implements a check using "ptrace_may_attach" before allowing
access to read the maps contents.  To control this protection, the new knob
/proc/sys/kernel/maps_protect has been added, with corresponding updates to
the procfs documentation.

[akpm@linux-foundation.org: build fixes]
[akpm@linux-foundation.org: New sysctl numbers are old hat]
Signed-off-by: Kees Cook <kees@outflux.net>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:02 -07:00
Christoph Hellwig
5843205b55 namei.c: remove utterly outdated comment
We don't have a routine called namei() anymore since at least 2.3.x, and
the comment is just totally out of sync with the current lookup logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:02 -07:00
Christoph Hellwig
acb0c854fa vfs: remove superflous sb == NULL checks
inode->i_sb is always set, not need to check for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:02 -07:00
Alexey Dobriyan
578c8183c1 proc: remove pathetic ->deleted WARN_ON
WARN_ON(de && de->deleted); is sooo unreliable. Why?

proc_lookup				remove_proc_entry
===========				=================
lock_kernel();
spin_lock(&proc_subdir_lock);
[find proc entry]
spin_unlock(&proc_subdir_lock);
					spin_lock(&proc_subdir_lock);
					[find proc entry]

proc_get_inode
==============
WARN_ON(de && de->deleted);			...

					if (!atomic_read(&de->count))
						free_proc_entry(de);
					else
						de->deleted = 1;

So, if you have some strange oops [1], and doesn't see this WARN_ON it means
nothing.

[1] try_module_get() of module which doesn't exist, two lines below
    should suffice, or not?

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:02 -07:00
Darrick J. Wong
59cd0cbc75 Fix race between proc_readdir and remove_proc_entry
Fix the following race:

proc_readdir				remove_proc_entry
============				=================

spin_lock(&proc_subdir_lock);
[choose PDE to start filldir from]
spin_unlock(&proc_subdir_lock);
					spin_lock(&proc_subdir_lock);
					[find PDE]
					[free PDE, refcount is 0]
					spin_unlock(&proc_subdir_lock);
		    /* boom */
if (filldir(dirent, de->name, ...

[de_put on error path --adobriyan]
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:02 -07:00
Alexey Dobriyan
7695650a92 Fix race between proc_get_inode() and remove_proc_entry()
proc_lookup				remove_proc_entry
===========				=================

lock_kernel();
spin_lock(&proc_subdir_lock);
[find PDE with refcount 0]
spin_unlock(&proc_subdir_lock);
					spin_lock(&proc_subdir_lock);
					[find PDE with refcount 0]
					[check refcount and free PDE]
					spin_unlock(&proc_subdir_lock);
proc_get_inode:
	de_get(de); /* boom */

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:01 -07:00
Miklos Szeredi
79c0b2df79 add filesystem subtype support
There's a slight problem with filesystem type representation in fuse
based filesystems.

From the kernel's view, there are just two filesystem types: fuse and
fuseblk.  From the user's view there are lots of different filesystem
types.  The user is not even much concerned if the filesystem is fuse based
or not.  So there's a conflict of interest in how this should be
represented in fstab, mtab and /proc/mounts.

The current scheme is to encode the real filesystem type in the mount
source.  So an sshfs mount looks like this:

  sshfs#user@server:/   /mnt/server    fuse   rw,nosuid,nodev,...

This url-ish syntax works OK for sshfs and similar filesystems.  However
for block device based filesystems (ntfs-3g, zfs) it doesn't work, since
the kernel expects the mount source to be a real device name.

A possibly better scheme would be to encode the real type in the type
field as "type.subtype".  So fuse mounts would look like this:

  /dev/hda1       /mnt/windows   fuseblk.ntfs-3g   rw,...
  user@server:/   /mnt/server    fuse.sshfs        rw,nosuid,nodev,...

This patch adds the necessary code to the kernel so that this can be
correctly displayed in /proc/mounts.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:01 -07:00
Davide Libenzi
6192bd536f epoll: optimizations and cleanups
Epoll is doing multiple passes over the ready set at the moment, because of
the constraints over the f_op->poll() call.  Looking at the code again, I
noticed that we already hold the epoll semaphore in read, and this
(together with other locking conditions that hold while doing an
epoll_wait()) can lead to a smarter way [1] to "ship" events to userspace
(in a single pass).

This is a stress application that can be used to test the new code.  It
spwans multiple thread and call epoll_wait() and epoll_ctl() from many
threads.  Stress tested on my dual Opteron 254 w/out any problems.

http://www.xmailserver.org/totalmess.c

This is not a benchmark, just something that tries to stress and exploit
possible problems with the new code.
Also, I made a stupid micro-benchmark:

http://www.xmailserver.org/epwbench.c

[1] Considering that epoll must be thread-safe, there are five ways we can
    be hit during an epoll_wait() transfer loop (ep_send_events()):

    1) The epoll fd going away and calling ep_free
       This just can't happen, since we did an fget() in sys_epoll_wait

    2) An epoll_ctl(EPOLL_CTL_DEL)
       This can't happen because epoll_ctl() gets ep->sem in write, and
       we're holding it in read during ep_send_events()

    3) An fd stored inside the epoll fd going away
       This can't happen because in eventpoll_release_file() we get
       ep->sem in write, and we're holding it in read during
       ep_send_events()

    4) Another epoll_wait() happening on another thread
       They both can be inside ep_send_events() at the same time, we get
       (splice) the ready-list under the spinlock, so each one will get
       its own ready list. Note that an fd cannot be at the same time
       inside more than one ready list, because ep_poll_callback() will
       not re-queue it if it sees it already linked:

       if (ep_is_linked(&epi->rdllink))
                goto is_linked;

       Another case that can happen, is two concurrent epoll_wait(),
       coming in with a userspace event buffer of size, say, ten.
       Suppose there are 50 event ready in the list. The first
       epoll_wait() will "steal" the whole list, while the second, seeing
       no events, will go to sleep. But at the end of ep_send_events() in
       the first epoll_wait(), we will re-inject surplus ready fds, and we
       will trigger the proper wake_up to the second epoll_wait().

    5) ep_poll_callback() hitting us asyncronously
       This is the tricky part. As I said above, the ep_is_linked() test
       done inside ep_poll_callback(), will guarantee us that until the
       item will result linked to a list, ep_poll_callback() will not try
       to re-queue it again (read, write data on any of its members). When
       we do a list_del() in ep_send_events(), the item will still satisfy
       the ep_is_linked() test (whatever data is written in prev/next,
       it'll never be its own pointer), so ep_poll_callback() will still
       leave us alone. It's only after the eventual smp_mb()+INIT_LIST_HEAD(&epi->rdllink)
       that it'll become visible to ep_poll_callback(), but at the point
       we're already past it.

[akpm@osdl.org: 80 cols]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:01 -07:00
Dmitriy Monakhov
fedee54d8f ext3: dirindex error pointer issues
- ext3_dx_find_entry() exit with out setting proper error pointer

- do_split() exit with out setting proper error pointer
  it is realy painful because many callers contain folowing code:

          de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
          if (!(de))
                       return retval;
          <<< WOW retval wasn't changed by do_split(), so caller failed
          <<< but return SUCCESS :)

- Rearrange do_split() error path. Current error path is realy ugly, all
  this up and down jump stuff doesn't make code easy to understand.

[dmonakhov@sw.ru: fix annoying fake error messages]
Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org>
Cc: Andreas Dilger <adilger@clusterfs.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:01 -07:00
Badari Pulavarty
e3222c4ecc Merge sys_clone()/sys_unshare() nsproxy and namespace handling
sys_clone() and sys_unshare() both makes copies of nsproxy and its associated
namespaces.  But they have different code paths.

This patch merges all the nsproxy and its associated namespace copy/clone
handling (as much as possible).  Posted on container list earlier for
feedback.

- Create a new nsproxy and its associated namespaces and pass it back to
  caller to attach it to right process.

- Changed all copy_*_ns() routines to return a new copy of namespace
  instead of attaching it to task->nsproxy.

- Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines.

- Removed unnessary !ns checks from copy_*_ns() and added BUG_ON()
  just incase.

- Get rid of all individual unshare_*_ns() routines and make use of
  copy_*_ns() instead.

[akpm@osdl.org: cleanups, warning fix]
[clg@fr.ibm.com: remove dup_namespaces() declaration]
[serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval]
[akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <containers@lists.osdl.org>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:00 -07:00
Nick Piggin
4fc75ff481 exec: fix remove_arg_zero
Petr Tesarik discovered a problem in remove_arg_zero(). He writes:

 When a script is loaded, load_script() replaces argv[0] with the
 name of the interpreter and the filename passed to the exec syscall.
 However, there is no guarantee that the length of the interpreter
 name plus the length of the filename is greater than the length of
 the original argv[0]. If the difference happens to cross a page boundary,
 setup_arg_pages() will call put_dirty_page() [aka install_arg_page()]
 with an address outside the VMA.

 Therefore, remove_arg_zero() must free all pages which would be unused
 after the argument is removed.

So, rewrite the remove_arg_zero function without gotos, with a few comments,
and with the commonly used explicit index/offset. This fixes the problem
and makes it easier to understand as well.

[a.p.zijlstra@chello.nl: add comment]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:00 -07:00
Robert P. J. Day
f87367a6b1 reiserfs: correct misspelled "REISERFS_PROC_INFO" to "CONFIG_REISERFS_PROC_INFO"
Correct the misspelling of the preprocessor check of a Kconfig option to refer
to CONFIG_REISERFS_PROC_INFO and not just the incorrect REISERFS_PROC_INFO.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:00 -07:00
Alexey Dobriyan
fe08a9d498 reiserfs: shrink superblock if no xattrs
This makes in-core superblock fit into one cacheline here.

Before:
    struct dentry *            xattr_root;           /*   124     4 */
    /* --- cacheline 1 boundary (128 bytes) --- */
    struct rw_semaphore        xattr_dir_sem;        /*   128    12 */
    int                        j_errno;              /*   140     4 */
    }; /* size: 144, cachelines: 2 */
       /* sum members: 142, holes: 1, sum holes: 2 */
       /* last cacheline: 16 bytes */

After:
    int                        j_errno;              /*   124     4 */
    /* --- cacheline 1 boundary (128 bytes) --- */
    }; /* size: 128, cachelines: 1 */
       /* sum members: 126, holes: 1, sum holes: 2 */

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:00 -07:00
Dmitriy Monakhov
2d3466a348 reiserfs: possible null pointer dereference during resize
sb_read may return NULL, let's explicitly check it.  If so free new bitmap
blocks array, after this we may safely exit as it done above during bitmap
allocation.

Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:59 -07:00
Dmitriy Monakhov
82f703bb8c freevxfs: possible null pointer dereference fix
sb_read may return NULL, so let's explicitly check it.

Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:59 -07:00
Vignesh Babu BM
1368c4f248 is_power_of_2 in fs/block_dev.c
Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2

Signed-off-by: vignesh babu <vignesh.babu@wipro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:59 -07:00
Vignesh Babu BM
e1b5c1d3da is_power_of_2 in fs/hfs
Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2

Signed-off-by: vignesh babu <vignesh.babu@wipro.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:59 -07:00
Vignesh Babu BM
e7d709c096 is_power_of_2 in fat
Replacing (n & (n-1)) in the context of power of 2 checks with
is_power_of_2

Signed-off-by: vignesh babu <vignesh.babu@wipro.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:59 -07:00
Florin Malita
3972b7f67b devpts: add fsnotify create event
Currently, devpts doesn't generate an fsnotify event upon pts creation
because the regular vfs paths aren't involved.  Deallocation, on the other
hand, correctly generates a nameremove event thanks to the d_delete()
invocation in devpts_pty_kill().

This patch adds the missing fsnotify_create() trigger in devpts_pty_new().

Signed-off-by: Florin Malita <fmalita@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:59 -07:00
Chris Snook
1ae7075bcd use use SEEK_MAX to validate user lseek arguments
Add SEEK_MAX and use it to validate lseek arguments from userspace.

Signed-off-by: Chris Snook <csnook@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:59 -07:00
Chris Snook
7b8e89249b use symbolic constants in generic lseek code
Convert magic numbers to SEEK_* values from fs.h

Signed-off-by: Chris Snook <csnook@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:59 -07:00
Andrew Morton
24c32d733d mm: shrink parent dentries when shrinking slab
Teach the dentry slab shrinker to aggressively shrink parent dentries when
shrinking the dentry cache.

This is done to attempt to improve the situation where the dentry slab cache
gets a lot of internal fragmentation due to pages containing directory
dentries.  It is expected that this change will cause some of those dentries
to be reaped earlier, and with less scanning.

Needs careful testing.

Cc: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:58 -07:00
Miklos Szeredi
d52b908646 fix quadratic behavior of shrink_dcache_parent()
The time shrink_dcache_parent() takes, grows quadratically with the depth
of the tree under 'parent'.  This starts to get noticable at about 10,000.

These kinds of depths don't occur normally, and filesystems which invoke
shrink_dcache_parent() via d_invalidate() seem to have other depth
dependent timings, so it's not even easy to expose this problem.

However with FUSE it's easy to create a deep tree and d_invalidate()
will also get called.  This can make a syscall hang for a very long
time.

This is the original discovery of the problem by Russ Cox:

  http://article.gmane.org/gmane.comp.file-systems.fuse.devel/3826

The following patch fixes the quadratic behavior, by optionally allowing
prune_dcache() to prune ancestors of a dentry in one go, instead of doing
it one at a time.

Common code in dput() and prune_one_dentry() is extracted into a new helper
function d_kill().

shrink_dcache_parent() as well as shrink_dcache_sb() are converted to use
the ancestry-pruner option.  Only for shrink_dcache_memory() is this
behavior not desirable, so it keeps using the old algorithm.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:58 -07:00
William Cohen
97dc32cdb1 reduce size of task_struct on 64-bit machines
This past week I was playing around with that pahole tool
(http://oops.ghostprotocols.net:81/acme/dwarves/) and looking at the size
of various struct in the kernel.  I was surprised by the size of the
task_struct on x86_64, approaching 4K.  I looked through the fields in
task_struct and found that a number of them were declared as "unsigned
long" rather than "unsigned int" despite them appearing okay as 32-bit
sized fields.  On x86_64 "unsigned long" ends up being 8 bytes in size and
forces 8 byte alignment.  Is there a reason there a reason they are
"unsigned long"?

The patch below drops the size of the struct from 3808 bytes (60 64-byte
cachelines) to 3760 bytes (59 64-byte cachelines).  A couple other fields
in the task struct take a signficant amount of space:

struct thread_struct       thread;               688
struct held_lock           held_locks[30];       1680

CONFIG_LOCKDEP is turned on in the .config

[akpm@linux-foundation.org: fix printk warnings]
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:58 -07:00
Markus Rechberger
4d7bf11d64 ext2/3/4: fix file date underflow on ext2 3 filesystems on 64 bit systems
Taken from http://bugzilla.kernel.org/show_bug.cgi?id=5079

signed long ranges from -2.147.483.648 to 2.147.483.647 on x86 32bit

10000011110110100100111110111101 .. -2,082,844,739
10000011110110100100111110111101 ..  2,212,122,557 <- this currently gets
stored on the disk but when converting it to a 64bit signed long value it loses
its sign and becomes positive.

Cc: Andreas Dilger <adilger@dilger.ca>
Cc: <linux-ext4@vger.kernel.org>

Andreas says:

This patch is now treating timestamps with the high bit set as negative
times (before Jan 1, 1970).  This means we lose 1/2 of the possible range
of timestamps (lopping off 68 years before unix timestamp overflow -
now only 30 years away :-) to handle the extremely rare case of setting
timestamps into the distant past.

If we are only interested in fixing the underflow case, we could just
limit the values to 0 instead of storing negative values.  At worst this
will skew the timestamp by a few hours for timezones in the far east
(files would still show Jan 1, 1970 in "ls -l" output).

That said, it seems 32-bit systems (mine at least) allow files to be set
into the past (01/01/1907 works fine) so it seems this patch is bringing
the x86_64 behaviour into sync with other kernels.

On the plus side, we have a patch that is ready to add nanosecond timestamps
to ext3 and as an added bonus adds 2 high bits to the on-disk timestamp so
this extends the maximum date to 2242.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:58 -07:00
Alexey Dobriyan
8948e11f45 Allow access to /proc/$PID/fd after setuid()
/proc/$PID/fd has r-x------ permissions, so if process does setuid(), it
will not be able to access /proc/*/fd/. This breaks fstatat() emulation
in glibc.

open("foo", O_RDONLY|O_DIRECTORY)       = 4
setuid32(65534)                         = 0
stat64("/proc/self/fd/4/bar", 0xbfafb298) = -1 EACCES (Permission denied)

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Acked-By: Kirill Korotaev <dev@openvz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:58 -07:00
Andrew Morton
7e4c3690b0 block_write_full_page(): report ENOSPC
block_write_full_page() forgot to propagate ENPSOC into the address_space.

Cc: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
Guillaume Chazarain
3e9f45bd18 Factor outstanding I/O error handling
Cleanup: setting an outstanding error on a mapping was open coded too many
times.  Factor it out in mapping_set_error().

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
Jeff Dike
f1adc05e77 uml: hostfs style fixes
hostfs needed some style goodness.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
Alberto Bertogli
5822b7faca uml: make hostfs_setattr() support operations on unlinked open files
This patch allows hostfs_setattr() to work on unlinked open files by calling
set_attr() (the userspace part) with the inode's fd.

Without this, applications that depend on doing attribute changes to unlinked
open files will fail.

It works by using the fd versions instead of the path ones (for example
fchmod() instead of chmod(), fchown() instead of chown()) when an fd is
available.

Signed-off-by: Alberto Bertogli <albertito@gmail.com>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
Dmitriy Monakhov
0ceb331433 mm: move common segment checks to separate helper function
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org>
Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Anton Altaparmakov <aia21@cam.ac.uk>
Acked-by: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:14:57 -07:00
Jens Axboe
86aa5ac53e [PATCH] splice: always call into page_cache_readahead()
Don't try to guess what the read-ahead logic will do, allow it
to make its own decisions.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-05-08 08:46:19 +02:00
Fengguang Wu
9ae9d68cbf [PATCH] splice(): fix interaction with readahead
Eric Dumazet, thank you for disclosing this bug.

Readahead logic somehow fails to populate the page range with data.
It can be because

1) the readahead routine is not always called in the following lines of

fs/splice.c:
        if (!loff || nr_pages > 1)
                page_cache_readahead(mapping, &in->f_ra, in, index, nr_pages);

2) even called, page_cache_readahead() wont guarantee the pages are there.
It wont submit readahead I/O for pages already in the radix tree, or when
(ra_pages == 0), or after 256 cache hits.

In your case, it should be because of the retried reads, which lead to
excessive cache hits, and disables readahead at some time.

And that _one_ failure of readahead blocks the whole read process.
The application receives EAGAIN and retries the read, but
__generic_file_splice_read() refuse to make progress:

- in the previous invocation, it has allocated a blank page and inserted it
  into the radix tree, but never has the chance to start I/O for it: the test
  of SPLICE_F_NONBLOCK goes before that.

- in the retried invocation, the readahead code will neither get out of the
  cache hit mode, nor will it submit I/O for an already existing page.

Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-05-08 08:44:36 +02:00
Lachlan McIlroy
f7c66ce3f7 [XFS] Add lockdep support for XFS
SGI-PV: 963965
SGI-Modid: xfs-linux-melb:xfs-kern:28485a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:50:19 +10:00
Lachlan McIlroy
71dfd5a396 [XFS] Fix race in xfs_write() b/w dmapi callout and direct I/O checks.
In xfs_write() the iolock is dropped and reacquired in XFS_SEND_DATA()
which means that the file could change from not-cached to cached and we
need to redo the direct I/O checks. We should also redo the direct I/O
checks when the file size changes regardless if O_APPEND is set or not.

SGI-PV: 963483
SGI-Modid: xfs-linux-melb:xfs-kern:28440a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:50:12 +10:00
Utako Kusaka
3a02ee1828 [XFS] Get rid of redundant "required" in msg.
SGI-PV: 963466
SGI-Modid: xfs-linux-melb:xfs-kern:28416a

Signed-off-by: Utako Kusaka <utako@tnes.nec.co.jp>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
2007-05-08 13:50:06 +10:00
Tim Shimmin
e6a0e9cdff [XFS] Export via a function xfs_buftarg_list for use by kdb/xfsidbg.
SGI-PV: 963465
SGI-Modid: xfs-linux-melb:xfs-kern:28414a

Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2007-05-08 13:49:59 +10:00
Tim Shimmin
f10bb2dad0 [XFS] Remove unused ilen variable and references.
SGI-PV: 907752
SGI-Modid: xfs-linux-melb:xfs-kern:28344a

Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
2007-05-08 13:49:53 +10:00
Lachlan McIlroy
ba87ea699e [XFS] Fix to prevent the notorious 'NULL files' problem after a crash.
The problem that has been addressed is that of synchronising updates of
the file size with writes that extend a file. Without the fix the update
of a file's size, as a result of a write beyond eof, is independent of
when the cached data is flushed to disk. Often the file size update would
be written to the filesystem log before the data is flushed to disk. When
a system crashes between these two events and the filesystem log is
replayed on mount the file's size will be set but since the contents never
made it to disk the file is full of holes. If some of the cached data was
flushed to disk then it may just be a section of the file at the end that
has holes.

There are existing fixes to help alleviate this problem, particularly in
the case where a file has been truncated, that force cached data to be
flushed to disk when the file is closed. If the system crashes while the
file(s) are still open then this flushing will never occur.

The fix that we have implemented is to introduce a second file size,
called the in-memory file size, that represents the current file size as
viewed by the user. The existing file size, called the on-disk file size,
is the one that get's written to the filesystem log and we only update it
when it is safe to do so. When we write to a file beyond eof we only
update the in- memory file size in the write operation. Later when the I/O
operation, that flushes the cached data to disk completes, an I/O
completion routine will update the on-disk file size. The on-disk file
size will be updated to the maximum offset of the I/O or to the value of
the in-memory file size if the I/O includes eof.

SGI-PV: 958522
SGI-Modid: xfs-linux-melb:xfs-kern:28322a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:49:46 +10:00
Lachlan McIlroy
2a32963130 [XFS] Fix race condition in xfs_write().
This change addresses a race in xfs_write() where, for direct I/O, the
flags need_i_mutex and need_flush are setup before the iolock is acquired.
The logic used to setup the flags may change between setting the flags and
acquiring the iolock resulting in these flags having incorrect values. For
example, if a file is not currently cached then need_i_mutex is set to
zero and then if the file is cached before the iolock is acquired we will
fail to do the flushinval before the direct write.

The flush (and also the call to xfs_zero_eof()) need to be done with the
iolock held exclusive so we need to acquire the iolock before checking for
cached data (or if the write begins after eof) to prevent this state from
changing. For direct I/O I've chosen to always acquire the iolock in
shared mode initially and if there is a need to promote it then drop it
and reacquire it.

There's also some other tidy-ups including removing the O_APPEND offset
adjustment since that work is done in generic_write_checks() (and we don't
use offset as an input parameter anywhere).

SGI-PV: 962170
SGI-Modid: xfs-linux-melb:xfs-kern:28319a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:49:39 +10:00
Kouta Ooizumi
e6d29426bc [XFS] Fix uquota and oquota enforcement problems.
When uquota and oquota (gquota/pquota) are enabled for accounting both are
enforced if ether has enforcement active.

Conditions:

- Both XFS_UQUOTA_ACCT and XFS_GQUOTA_ACCT are enabled.

- Either XFS_UQUOTA_ENFD or XFS_OQUOTA_ENFD is enabled.

- The usage without enforce is reached at the soft limit.

Problems:

1. "repquota" shows all grace time even if no enforcement.

2. we cannot make a file over a hard limits even if no enforcement.

SGI-PV: 962291
SGI-Modid: xfs-linux-melb:xfs-kern:28272a

Signed-off-by: Kouta Ooizumi <k-ooizumi@tnes.nec.co.jp>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:49:33 +10:00
Lachlan McIlroy
d3cf209476 [XFS] propogate return codes from flush routines
This patch handles error return values in fs_flush_pages and
fs_flushinval_pages. It changes the prototype of fs_flushinval_pages so we
can propogate the errors and handle them at higher layers. I also modified
xfs_itruncate_start so that it could propogate the error further.

SGI-PV: 961990
SGI-Modid: xfs-linux-melb:xfs-kern:28231a

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Stewart Smith <stewart@flamingspork.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:49:27 +10:00
Donald Douwsma
424ea91ba6 [XFS] Fix quotaon syscall failures for group enforcement requests.
xfs_qm_scall_quotaon was incorrectly failing requests to enable group
quota enforcement. Fixes logic error in OQUOTA handling.

SGI-PV: 961964
SGI-Modid: xfs-linux-melb:xfs-kern:28227a

Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:49:15 +10:00
Donald Douwsma
646d5bdab3 [XFS] Invalidate quotacheck when mounting without a quota type.
When quotas are mounted or remounted without a particular quota type the
quota accounting for that type becomes invalid. Previously we were
ignoring this leading to accounting errors.

SGI-PV: 961964
SGI-Modid: xfs-linux-melb:xfs-kern:28225a

Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Utako Kusaka <utako@tnes.nec.co.jp>
Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:49:09 +10:00
Joe Perches
e7a23a9b37 [XFS] reducing the number of random number functions.
Patch provided by Joe Perches

SGI-PV: 961696
SGI-Modid: xfs-linux-melb:xfs-kern:28209a

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:49:03 +10:00
Eric Sandeen
e9ed9d2240 [XFS] remove more misc. unused args
Patch provided by Eric Sandeen.

SGI-PV: 961695
SGI-Modid: xfs-linux-melb:xfs-kern:28205a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:48:56 +10:00
Eric Sandeen
ef497f8a1e [XFS] the "aendp" arg to xfs_dir2_data_freescan is always NULL, remove it.
Patch provided by Eric Sandeen.

SGI-PV: 961694
SGI-Modid: xfs-linux-melb:xfs-kern:28204a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:48:49 +10:00
Eric Sandeen
1c72bf9003 [XFS] The last argument "lsn" of xfs_trans_commit() is always called with
NULL.

Patch provided by Eric Sandeen.

SGI-PV: 961693
SGI-Modid: xfs-linux-melb:xfs-kern:28199a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08 13:48:42 +10:00
David Woodhouse
1c97964520 [JFFS2] Simplify and clean up jffs2_add_tn_to_tree() some more.
Fixing at least a couple more bugs in the process.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-05-08 00:19:54 +01:00
Linus Torvalds
2d56d3c43c Merge branch 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux
* 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux:
  gfs2: nfs lock support for gfs2
  lockd: add code to handle deferred lock requests
  lockd: always preallocate block in nlmsvc_lock()
  lockd: handle test_lock deferrals
  lockd: pass cookie in nlmsvc_testlock
  lockd: handle fl_grant callbacks
  lockd: save lock state on deferral
  locks: add fl_grant callback for asynchronous lock return
  nfsd4: Convert NFSv4 to new lock interface
  locks: add lock cancel command
  locks: allow {vfs,posix}_lock_file to return conflicting lock
  locks: factor out generic/filesystem switch from setlock code
  locks: factor out generic/filesystem switch from test_lock
  locks: give posix_test_lock same interface as ->lock
  locks: make ->lock release private data before returning in GETLK case
  locks: create posix-to-flock helper functions
  locks: trivial removal of unnecessary parentheses
2007-05-07 12:34:24 -07:00
Linus Torvalds
5cefcab3db Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (34 commits)
  [GFS2] Uncomment sprintf_symbol calling code
  [DLM] lowcomms style
  [GFS2] printk warning fixes
  [GFS2] Patch to fix mmap of stuffed files
  [GFS2] use lib/parser for parsing mount options
  [DLM] Lowcomms nodeid range & initialisation fixes
  [DLM] Fix dlm_lowcoms_stop hang
  [DLM] fix mode munging
  [GFS2] lockdump improvements
  [GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks
  [GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump)
  [DLM] fs/dlm/ast.c should #include "ast.h"
  [DLM] Consolidate transport protocols
  [DLM] Remove redundant assignment
  [GFS2] Fix bz 234168 (ignoring rgrp flags)
  [DLM] change lkid format
  [DLM] interface for purge (2/2)
  [DLM] add orphan purging code (1/2)
  [DLM] split create_message function
  [GFS2] Set drop_count to 0 (off) by default
  ...
2007-05-07 12:26:27 -07:00
Bryan Wu
1394f03221 blackfin architecture
This adds support for the Analog Devices Blackfin processor architecture, and
currently supports the BF533, BF532, BF531, BF537, BF536, BF534, and BF561
(Dual Core) devices, with a variety of development platforms including those
avaliable from Analog Devices (BF533-EZKit, BF533-STAMP, BF537-STAMP,
BF561-EZKIT), and Bluetechnix!  Tinyboards.

The Blackfin architecture was jointly developed by Intel and Analog Devices
Inc.  (ADI) as the Micro Signal Architecture (MSA) core and introduced it in
December of 2000.  Since then ADI has put this core into its Blackfin
processor family of devices.  The Blackfin core has the advantages of a clean,
orthogonal,RISC-like microprocessor instruction set.  It combines a dual-MAC
(Multiply/Accumulate), state-of-the-art signal processing engine and
single-instruction, multiple-data (SIMD) multimedia capabilities into a single
instruction-set architecture.

The Blackfin architecture, including the instruction set, is described by the
ADSP-BF53x/BF56x Blackfin Processor Programming Reference
http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf

The Blackfin processor is already supported by major releases of gcc, and
there are binary and source rpms/tarballs for many architectures at:
http://blackfin.uclinux.org/gf/project/toolchain/frs There is complete
documentation, including "getting started" guides available at:
http://docs.blackfin.uclinux.org/ which provides links to the sources and
patches you will need in order to set up a cross-compiling environment for
bfin-linux-uclibc

This patch, as well as the other patches (toolchain, distribution,
uClibc) are actively supported by Analog Devices Inc, at:
http://blackfin.uclinux.org/

We have tested this on LTP, and our test plan (including pass/fails) can
be found at:
http://docs.blackfin.uclinux.org/doku.php?id=testing_the_linux_kernel

[m.kozlowski@tuxland.pl: balance parenthesis in blackfin header files]
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Aubrey Li <aubrey.li@analog.com>
Signed-off-by: Jie Zhang <jie.zhang@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:58 -07:00
Akinobu Mita
5bc98594d5 hugetlbfs: add NULL check in hugetlb_zero_setup()
If hugetlbfs module_init() fails, hugetlbfs_vfsmount is not initialized and
shmget() with SHM_HUGETLB flag will cause NULL pointer dereference.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:57 -07:00
Christoph Lameter
50953fe9e0 slab allocators: Remove SLAB_DEBUG_INITIAL flag
I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
SLAB.

I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again?  The callback is
performed before each freeing of an object.

I would think that it is much easier to check the object state manually
before the free.  That also places the check near the code object
manipulation of the object.

Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on.  If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e.  add debug code before kfree).

There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches.  Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

This is the last slab flag that SLUB did not support.  Remove the check for
unimplemented flags from SLUB.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:57 -07:00
Benjamin Herrenschmidt
036e08568c get_unmapped_area handles MAP_FIXED in hugetlbfs
Generic hugetlb_get_unmapped_area() now handles MAP_FIXED by just calling
prepare_hugepage_range()

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:57 -07:00
Christoph Lameter
0a31bd5f2b KMEM_CACHE(): simplify slab cache creation
This patch provides a new macro

KMEM_CACHE(<struct>, <flags>)

to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.

Example

struct test_slab {
	int a,b,c;
	struct list_head;
} __cacheline_aligned_in_smp;

test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab.  If it fails then we
panic.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:55 -07:00
Peter Zijlstra
96018fdacb mm: optimize acorn partition truncate
invalidate_bdev() is superfluous when truncate_inode_pages() is also
called.  do call invalidate_bh_lrus() though, to avoid stale pointers.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:55 -07:00
Peter Zijlstra
f9a14399ae mm: optimize kill_bdev()
Remove duplicate work in kill_bdev().

It currently invalidates and then truncates the bdev's mapping.
invalidate_mapping_pages() will opportunistically remove pages from the
mapping.  And truncate_inode_pages() will forcefully remove all pages.

The only thing truncate doesn't do is flush the bh lrus.  So do that
explicitly.  This avoids (very unlikely) but possible invalid lookup
results if the same bdev is quickly re-issued.

It also will prevent extreme kernel latencies which are observed when
blockdevs which have a large amount of pagecache are unmounted, by avoiding
invalidate_mapping_pages() on that path.  invalidate_mapping_pages() has no
cond_resched (it can be called under spinlock), whereas truncate_inode_pages()
has one.

[akpm@linux-foundation.org: restore nrpages==0 optimisation]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:55 -07:00
Peter Zijlstra
f98393a64c mm: remove destroy_dirty_buffers from invalidate_bdev()
Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
been used in 6 years (so akpm says).

find * -name \*.[ch] | xargs grep -l invalidate_bdev |
while read file; do
	quilt add $file;
	sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
done

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:55 -07:00
Christoph Lameter
d85f33855c Make page->private usable in compound pages
If we add a new flag so that we can distinguish between the first page and the
tail pages then we can avoid to use page->private in the first page.
page->private == page for the first page, so there is no real information in
there.

Freeing up page->private makes the use of compound pages more transparent.
They become more usable like real pages.  Right now we have to be careful f.e.
 if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
can then no longer use the private field.  This is one of the issues that
cause us not to support debugging for page size slabs in SLAB.

Having page->private available for SLUB would allow more meta information in
the page struct.  I can probably avoid the 16 bit ints that I have in there
right now.

Also if page->private is available then a compound page may be equipped with
buffer heads.  This may free up the way for filesystems to support larger
blocks than page size.

We add PageTail as an alias of PageReclaim.  Compound pages cannot currently
be reclaimed.  Because of the alias one needs to check PageCompound first.

The RFC for the this approach was discussed at
http://marc.info/?t=117574302800001&r=1&w=2

[nacc@us.ibm.com: fix hugetlbfs]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:53 -07:00
David Rientjes
b813e931b4 smaps: add clear_refs file to clear reference
Adds /proc/pid/clear_refs.  When any non-zero number is written to this file,
pte_mkold() and ClearPageReferenced() is called for each pte and its
corresponding page, respectively, in that task's VMAs.  This file is only
writable by the user who owns the task.

It is now possible to measure _approximately_ how much memory a task is using
by clearing the reference bits with

	echo 1 > /proc/pid/clear_refs

and checking the reference count for each VMA from the /proc/pid/smaps output
at a measured time interval.  For example, to observe the approximate change
in memory footprint for a task, write a script that clears the references
(echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and
extracts the size in kB.  Add the sizes for each VMA together for the total
referenced footprint.  Moments later, repeat the process and observe the
difference.

For example, using an efficient Mozilla:

	accumulated time		referenced memory
	----------------		-----------------
		 0 s				 408 kB
		 1 s				 408 kB
		 2 s				 556 kB
		 3 s				1028 kB
		 4 s				 872 kB
		 5 s				1956 kB
		 6 s				 416 kB
		 7 s				1560 kB
		 8 s				2336 kB
		 9 s				1044 kB
		10 s				 416 kB

This is a valuable tool to get an approximate measurement of the memory
footprint for a task.

Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
[akpm@linux-foundation.org: build fixes]
[mpm@selenic.com: rename for_each_pmd]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:52 -07:00
David Rientjes
f79f177c25 smaps: add pages referenced count to smaps
Adds an additional unsigned long field to struct mem_size_stats called
'referenced'.  For each pte walked in the smaps code, this field is
incremented by PAGE_SIZE if it has pte-reference bits.

An additional line was added to the /proc/pid/smaps output for each VMA to
indicate how many pages within it are currently marked as referenced or
accessed.

Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:52 -07:00
David Rientjes
826fad1b93 smaps: extract pmd walker from smaps code
Extracts the pmd walker from smaps-specific code in fs/proc/task_mmu.c.

The new struct pmd_walker includes the struct vm_area_struct of the memory to
walk over.  Iteration begins at the vma->vm_start and completes at
vma->vm_end.  A pointer to another data structure may be stored in the private
field such as struct mem_size_stats, which acts as the smaps accumulator.  For
each pmd in the VMA, the action function is called with a pointer to its
struct vm_area_struct, a pointer to the pmd_t, its start and end addresses,
and the private field.

The interface for walking pmd's in a VMA for fs/proc/task_mmu.c is now:

	void for_each_pmd(struct vm_area_struct *vma,
			  void (*action)(struct vm_area_struct *vma,
					 pmd_t *pmd, unsigned long addr,
					 unsigned long end,
					 void *private),
			  void *private);

Since the pmd walker is now extracted from the smaps code, smaps_one_pmd() is
invoked for each pmd in the VMA.  Its behavior and efficiency is identical to
the existing implementation.

Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:52 -07:00
Adrian Bunk
ac267728f1 mm/slab.c: proper prototypes
Add proper prototypes in include/linux/slab.h.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:52 -07:00
Nick Piggin
3d67f2d7c0 fs: buffer don't PageUptodate without page locked
__block_write_full_page is calling SetPageUptodate without the page locked.
This is unusual, but not incorrect, as PG_writeback is still set.

However the next patch will require that SetPageUptodate always be called with
the page locked.  Simply don't bother setting the page uptodate in this case
(it is unusual that the write path does such a thing anyway).  Instead just
leave it to the read side to bring the page uptodate when it notices that all
buffers are uptodate.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:51 -07:00
Nick Piggin
6fe6900e1e mm: make read_cache_page synchronous
Ensure pages are uptodate after returning from read_cache_page, which allows
us to cut out most of the filesystem-internal PageUptodate calls.

I didn't have a great look down the call chains, but this appears to fixes 7
possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
block2mtd.  All depending on whether the filler is async and/or can return
with a !uptodate page.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:51 -07:00
Adrian Bunk
d2ba27e800 proper prototype for hugetlb_get_unmapped_area()
Add a proper prototype for hugetlb_get_unmapped_area() in
include/linux/hugetlb.h.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:51 -07:00
David Woodhouse
fcf3cafb3e [JFFS2] Remove another bogus optimisation in jffs2_add_tn_to_tree()
We attempted to insert new nodes into the tree by just using
rb_replace_node to let them replace an earlier node which they
completely overlapped. However, that could place the new node into the
wrong place in the tree, since its start could be node only before the
start of the victim, but before the node _before_ the victim in the tree
(if that previous node actually ends _after_ the new node, thus isn't
entirely overlapped and wasn't itself chosen to be the victim).

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-05-07 13:16:13 +01:00
Marc Eshel
586759f03e gfs2: nfs lock support for gfs2
Add NFS lock support to GFS2.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-06 20:38:50 -04:00
Marc Eshel
1a8322b2b0 lockd: add code to handle deferred lock requests
Rewrite nlmsvc_lock() to use the asynchronous interface.

As with testlock, we answer nlm requests in nlmsvc_lock by first looking up
the block and then using the results we find in the block if B_QUEUED is
set, and calling vfs_lock_file() otherwise.

If this a new lock request and we get -EINPROGRESS return on a non-blocking
request then we defer the request.

Also modify nlmsvc_unlock() to call the filesystem method if appropriate.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06 20:38:50 -04:00
Marc Eshel
f812048020 lockd: always preallocate block in nlmsvc_lock()
Normally we could skip ever having to allocate a block in the case where
the client asks for a non-blocking lock, or asks for a blocking lock that
succeeds immediately.

However we're going to want to always look up a block first in order to
check whether we're revisiting a deferred lock call, and to be prepared to
handle the case where the filesystem returns -EINPROGRESS--in that case we
want to make sure the lock we've given the filesystem is the one embedded
in the block that we'll use to track the deferred request.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06 20:38:50 -04:00
Marc Eshel
5ea0d75037 lockd: handle test_lock deferrals
Rewrite nlmsvc_testlock() to use the new asynchronous interface: instead of
immediately doing a posix_test_lock(), we first look for a matching block.
If the subsequent test_lock returns anything other than -EINPROGRESS, we
then remove the block we've found and return the results.

If it returns -EINPROGRESS, then we defer the lock request.

In the case where the block we find in the first step has B_QUEUED set,
we bypass the vfs_test_lock entirely, instead using the block to decide how
to respond:
	with nlm_lck_denied if B_TIMED_OUT is set.
	with nlm_granted if B_GOT_CALLBACK is set.
	by dropping if neither B_TIMED_OUT nor B_GOT_CALLBACK is set

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06 20:38:50 -04:00
Marc Eshel
85f3f1b3f7 lockd: pass cookie in nlmsvc_testlock
Change NLM internal interface to pass more information for test lock; we
need this to make sure the cookie information is pushed down to the place
where we do request deferral, which is handled for testlock by the
following patch.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06 20:38:50 -04:00
Marc Eshel
0e4ac9d935 lockd: handle fl_grant callbacks
Add code to handle file system callback when the lock is finally granted.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06 20:38:50 -04:00
Marc Eshel
2b36f412ab lockd: save lock state on deferral
We need to keep some state for a pending asynchronous lock request, so this
patch adds that state to struct nlm_block.

This also adds a function which defers the request, by calling
rqstp->rq_chandle.defer and storing the resulting deferred request in a
nlm_block structure which we insert into lockd's global block list.  That
new function isn't called yet, so it's dead code until a later patch.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06 20:38:50 -04:00
Marc Eshel
2beb6614f5 locks: add fl_grant callback for asynchronous lock return
Acquiring a lock on a cluster filesystem may require communication with
remote hosts, and to avoid blocking lockd or nfsd threads during such
communication, we allow the results to be returned asynchronously.

When a ->lock() call needs to block, the file system will return
-EINPROGRESS, and then later return the results with a call to the
routine in the fl_grant field of the lock_manager_operations struct.

This differs from the case when ->lock returns -EAGAIN to a blocking
lock request; in that case, the filesystem calls fl_notify when the lock
is granted, and the caller retries the original lock.  So while
fl_notify is merely a hint to the caller that it should retry, fl_grant
actually communicates the final result of the lock operation (with the
lock already acquired in the succesful case).

Therefore fl_grant takes a lock, a status and, for the test lock case, a
conflicting lock.  We also allow fl_grant to return an error to the
filesystem, to handle the case where the fl_grant requests arrives after
the lock manager has already given up waiting for it.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06 20:38:49 -04:00
Marc Eshel
fd85b8170d nfsd4: Convert NFSv4 to new lock interface
Convert NFSv4 to the new lock interface.  We don't define any callback for now,
so we're not taking advantage of the asynchronous feature--that's less critical
for the multi-threaded nfsd then it is for the single-threaded lockd.  But this
does allow a cluster filesystems to export cluster-coherent locking to NFS.

Note that it's cluster filesystems that are the issue--of the filesystems that
define lock methods (nfs, cifs, etc.), most are not exportable by nfsd.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06 20:38:49 -04:00
Marc Eshel
9b9d2ab415 locks: add lock cancel command
Lock managers need to be able to cancel pending lock requests.  In the case
where the exported filesystem manages its own locks, it's not sufficient just
to call posix_unblock_lock(); we need to let the filesystem know what's
happening too.

We do this by adding a new fcntl lock command: FL_CANCELLK.  Some day this
might also be made available to userspace applications that could benefit from
an asynchronous locking api.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06 20:38:28 -04:00
Marc Eshel
150b393456 locks: allow {vfs,posix}_lock_file to return conflicting lock
The nfsv4 protocol's lock operation, in the case of a conflict, returns
information about the conflicting lock.

It's unclear how clients can use this, so for now we're not going so far as to
add a filesystem method that can return a conflicting lock, but we may as well
return something in the local case when it's easy to.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06 19:23:24 -04:00
Marc Eshel
7723ec9777 locks: factor out generic/filesystem switch from setlock code
Factor out the code that switches between generic and filesystem-specific lock
methods; eventually we want to call this from lock managers (lockd and nfsd)
too; currently they only call the generic methods.

This patch does that for all the setlk code.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06 18:08:49 -04:00
J. Bruce Fields
3ee17abd14 locks: factor out generic/filesystem switch from test_lock
Factor out the code that switches between generic and filesystem-specific lock
methods; eventually we want to call this from lock managers (lockd and nfsd)
too; currently they only call the generic methods.

This patch does that for test_lock.

Note that this hasn't been necessary until recently, because the few
filesystems that define ->lock() (nfs, cifs...) aren't exportable via NFS.
However GFS (and, in the future, other cluster filesystems) need to implement
their own locking to get cluster-coherent locking, and also want to be able to
export locking to NFS (lockd and NFSv4).

So we accomplish this by factoring out code such as this and exporting it for
the use of lockd and nfsd.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06 18:06:44 -04:00
Marc Eshel
9d6a8c5c21 locks: give posix_test_lock same interface as ->lock
posix_test_lock() and ->lock() do the same job but have gratuitously
different interfaces.  Modify posix_test_lock() so the two agree,
simplifying some code in the process.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06 17:39:00 -04:00
J. Bruce Fields
70cc6487a4 locks: make ->lock release private data before returning in GETLK case
The file_lock argument to ->lock is used to return the conflicting lock
when found.  There's no reason for the filesystem to return any private
information with this conflicting lock, but nfsv4 is.

Fix nfsv4 client, and modify locks.c to stop calling fl_release_private
for it in this case.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: "Trond Myklebust" <Trond.Myklebust@netapp.com>"
2007-05-06 17:38:19 -04:00
David Woodhouse
96dd8d25d1 [JFFS2] Remove broken insert_point optimisation in jffs2_add_tn_to_tree()
The original code would remember, during the first pass over the tree,
a suitable place to start the insertion from when we eventually come
to add a new node.

The optimisation was broken, and we sometimes ended up inserting a new
node in the wrong place because we started the insertion from the wrong
point.

Just ditch the optimisation and start the insertion from the root of the
tree, for now. I'll try it again when I'm feeling cleverer.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-05-06 14:41:40 +01:00
Linus Torvalds
b7405e1643 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] Fix typo in cifs readme from previous commit
  [CIFS] Make sec=none force an anonymous mount
  [CIFS] Change semaphore to mutex for cifs lock_sem
  [CIFS] Fix oops in reset_cifs_unix_caps on reconnect
  [CIFS] UID/GID override on CIFS mounts to Samba
  [CIFS] prefixpath mounts to servers supporting posix paths used wrong slash
  [CIFS] Update cifs version to 1.49
  [CIFS] Replace kmalloc/memset combination with kzalloc
  [CIFS]  Add IPv6 support
  [CIFS] New CIFS POSIX mkdir performance improvement (part 2)
  [CIFS] New CIFS POSIX mkdir performance improvement
  [CIFS] Add write perm for usr to file on windows should remove r/o dos attr
  [CIFS] Remove unnecessary parm to cifs_reopen_file
  [CIFS] Switch cifsd to kthread_run from kernel_thread
  [CIFS] Remove unnecessary checks
2007-05-05 15:30:53 -07:00
Steve French
0ec54aa8af [CIFS] Fix typo in cifs readme from previous commit
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-05-05 22:08:06 +00:00
Linus Torvalds
ea62ccd00f Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits)
  [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall
  [PATCH] i386: type may be unused
  [PATCH] i386: Some additional chipset register values validation.
  [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split.
  [PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff
  [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu
  [PATCH] i386: white space fixes in i387.h
  [PATCH] i386: Drop noisy e820 debugging printks
  [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c
  [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems
  [PATCH] x86-64: Share identical video.S between i386 and x86-64
  [PATCH] x86-64: Remove CONFIG_REORDER
  [PATCH] x86-64: Print type and size correctly for unknown compat ioctls
  [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0)
  [PATCH] i386: Little cleanups in smpboot.c
  [PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning
  [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible
  [PATCH] i386: Add X86_FEATURE_RDTSCP
  [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
  [PATCH] i386: Implement alternative_io for i386
  ...

Fix up trivial conflict in include/linux/highmem.h manually.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-05 14:55:20 -07:00
Dave Kleikamp
05ec9e26be JFS: Fix race waking up jfsIO kernel thread
It's possible for a journal I/O request to be added to the log_redrive
queue and the jfsIO thread to be awakened after the thread releases
log_redrive_lock but before it sets its state to TASK_INTERRUPTIBLE.

The jfsIO thread should set the state before giving up the spinlock, so
the waking thread will really wake it.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2007-05-05 14:24:05 -05:00
David Woodhouse
1123e2a859 [JFFS2] Remember to calculate overlap on nodes which replace older nodes
This fixes a problem Artem found with the integck test tool -- we
weren't correctly keeping track of the 'overlap' flag in some cases,
which led to the nodes being played back in an incorrect order and file
corruption.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-05-05 16:29:34 +01:00
David Woodhouse
3fddb6c985 [JFFS2] Don't advance c->wbuf_ofs to next eraseblock after wbuf flush
After flushing the last page of an eraseblock, don't leave the
wbuf 'offset' field pointing at the start of the next physical
eraseblock. This was causing a BUG() on NOR-ECC (Sibley) flash, where
we start writing a little further in, after the cleanmarker.

Debugged by Alexander Belyakov <abelyako@googlemail.com>

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-05-05 09:52:49 +01:00
Linus Torvalds
fa24aa561a Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: Force use of GFP_NOFS in ocfs2_write()
  ocfs2: fix sparse warnings in fs/ocfs2/cluster
  ocfs2: fix sparse warnings in fs/ocfs2/dlm
  ocfs2: fix sparse warnings in fs/ocfs2
  [PATCH] Copy i_flags to ocfs2 inode flags on write
  [PATCH] ocfs2: use __set_current_state()
  ocfs2: Wrap access of directory allocations with ip_alloc_sem.
  [PATCH] fs/ocfs2/: make 3 functions static
  ocfs2: Implement compat_ioctl()
2007-05-04 20:44:54 -07:00
Jeff Layton
8426c39c12 [CIFS] Make sec=none force an anonymous mount
We had a customer report that attempting to make CIFS mount with a null
username (i.e. doing an anonymous mount) doesn't work. Looking through the
code, it looks like CIFS expects a NULL username from userspace in order
to trigger an anonymous mount. The mount.cifs code doesn't seem to ever
pass a null username to the kernel, however.

It looks also like the kernel can take a sec=none option, but it only seems
to look at it if the username is already NULL. This seems redundant and
effectively makes sec=none useless.

The following patch makes sec=none force an anonymous mount.

Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-05-05 03:27:49 +00:00
Linus Torvalds
4d4700707c Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (28 commits)
  NFS: Fix a compile glitch on 64-bit systems
  NFS: Clean up nfs_create_request comments
  spkm3: initialize hash
  spkm3: remove bad kfree, unnecessary export
  spkm3: fix spkm3's use of hmac
  NFS4: invalidate cached acl on setacl
  NFS: Fix directory caching problem - with test case and patch.
  NFS: Set meaningful value for fattr->time_start in readdirplus results.
  NFS: Added support to turn off the NFSv3 READDIRPLUS RPC.
  SUNRPC: RPC client should retry with different versions of rpcbind
  SUNRPC: remove old portmapper
  NFS: switch NFSROOT to use new rpcbind client
  SUNRPC: switch the RPC server to use the new rpcbind registration API
  SUNRPC: switch socket-based RPC transports to use rpcbind
  SUNRPC: introduce rpcbind: replacement for in-kernel portmapper
  SUNRPC: Eliminate side effects from rpc_malloc
  SUNRPC: RPC buffer size estimates are too large
  NLM: Shrink the maximum request size of NLM4 requests
  NFS: Use pgoff_t in structures and functions that pass page cache offsets
  NFS: Clean up nfs_sync_mapping_wait()
  ...
2007-05-04 19:55:11 -07:00
Linus Torvalds
7e20ef030d Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (49 commits)
  [SCTP]: Set assoc_id correctly during INIT collision.
  [SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv()
  [SCTP]: Fix the SO_REUSEADDR handling to be similar to TCP.
  [SCTP]: Verify all destination ports in sctp_connectx.
  [XFRM] SPD info TLV aggregation
  [XFRM] SAD info TLV aggregationx
  [AF_RXRPC]: Sort out MTU handling.
  [AF_IUCV/IUCV] : Add missing section annotations
  [AF_IUCV]: Implementation of a skb backlog queue
  [NETLINK]: Remove bogus BUG_ON
  [IPV6]: Some cleanups in include/net/ipv6.h
  [TCP]: zero out rx_opt in tcp_disconnect()
  [BNX2]: Fix TSO problem with small MSS.
  [NET]: Rework dev_base via list_head (v3)
  [TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_start
  [BNX2]: Update version and reldate.
  [BNX2]: Print bus information for PCIE devices.
  [BNX2]: Add 1-shot MSI handler for 5709.
  [BNX2]: Restructure PHY event handling.
  [BNX2]: Add indirect spinlock.
  ...
2007-05-04 19:36:58 -07:00
Trond Myklebust
84dde76c4a NFS: Fix a compile glitch on 64-bit systems
fs/nfs/pagelist.c:226: error: conflicting types for 'nfs_pageio_init'
include/linux/nfs_page.h:80: error: previous declaration of 'nfs_pageio_init' was here

Thanks to Andrew for spotting this...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-04 14:44:06 -04:00
Pavel Emelianov
7562f876cd [NET]: Rework dev_base via list_head (v3)
Cleanup of dev_base list use, with the aim to simplify making device
list per-namespace. In almost every occasion, use of dev_base variable
and dev->next pointer could be easily replaced by for_each_netdev
loop. A few most complicated places were converted to using
first_netdev()/next_netdev().

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 15:13:45 -07:00
David Howells
ec9c948546 [AFS]: Adjust the new netdevice scanning code
Adjust the new netdevice scanning code provided by Patrick McHardy:

 (1) Restore the function banner comments that were dropped.

 (2) Rather than using an array size of 6 in some places and an array size of
     ETH_ALEN in others, pass a pointer instead and pass the array size
     through so that we can actually check it.

 (3) Do the buffer fill count check before checking the for_primary_ifa
     condition again.  This permits us to skip that check should maxbufs be
     reached before we run out of interfaces.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:29:41 -07:00
Patrick McHardy
dc1f6bff6a [AFS]: Replace rtnetlink client by direct dev_base walking
Replace the large and complicated rtnetlink client by two simple
functions for getting the MAC address for the first ethernet device
and building a list of IPv4 addresses.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:28:49 -07:00
Patrick McHardy
5b35fad9d4 [AFS]: Fix memory leak in SRXAFSCB_GetCapabilities
The interface array is not freed on exit.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:27:39 -07:00
David Howells
fbb3fcba72 [AFS]: Fix use of __exit functions from __init path
Fix use of __exit functions from __init path.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:12:46 -07:00
David Howells
80c72fe415 [AFS/AF_RXRPC]: Miscellaneous fixes.
Make miscellaneous fixes to AFS and AF_RXRPC:

 (*) Make AF_RXRPC select KEYS rather than RXKAD or AFS_FS in Kconfig.

 (*) Don't use FS_BINARY_MOUNTDATA.

 (*) Remove a done 'TODO' item in a comemnt on afs_get_sb().

 (*) Don't pass a void * as the page pointer argument of kmap_atomic() as this
     breaks on m68k.  Patch from Geert Uytterhoeven <geert@linux-m68k.org>.

 (*) Use match_*() functions rather than doing my own parsing.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:11:29 -07:00
Roland Dreier
796e5661f6 [CIFS] Change semaphore to mutex for cifs lock_sem
Originally at http://lkml.org/lkml/2006/9/2/86

The recent change to "allow Windows blocking locks to be cancelled via a
CANCEL_LOCK call" introduced a new semaphore in struct cifsFileInfo,
lock_sem.  However, semaphores used as mutexes are deprecated these days,
and there's no reason to add a new one to the kernel.  Therefore, convert
lock_sem to a struct mutex (and also fix one indentation glitch on one of
the lines changed anyway).

Signed-off-by: Roland Dreier <roland@digitalvampire.org>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-05-03 04:33:45 +00:00
Steve French
0b2365f826 [CIFS] Fix oops in reset_cifs_unix_caps on reconnect
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-05-03 04:30:13 +00:00
Greg Kroah-Hartman
823bccfc40 remove "struct subsystem" as it is no longer needed
We need to work on cleaning up the relationship between kobjects, ksets and
ktypes.  The removal of 'struct subsystem' is the first step of this,
especially as it is not really needed at all.

Thanks to Kay for fixing the bugs in this patch.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 18:57:59 -07:00
Randy Dunlap
2609e7b9be sysfs: printk format warning
Fix sysfs printk format warning:
fs/sysfs/bin.c:62: warning: format '%d' expects type 'int', but argument 4 has type 'size_t'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 18:57:59 -07:00
Mark Fasheh
9315f130e1 ocfs2: Force use of GFP_NOFS in ocfs2_write()
We can otherwise recurse into the file system.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:08:34 -07:00
Mark Fasheh
5fdf1e6771 ocfs2: fix sparse warnings in fs/ocfs2/cluster
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:08:23 -07:00
Mark Fasheh
a7d25539fd ocfs2: fix sparse warnings in fs/ocfs2/dlm
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:08:15 -07:00
Mark Fasheh
1ca1a111b1 ocfs2: fix sparse warnings in fs/ocfs2
None of these are actually harmful, but the noise makes looking for real
problems difficult.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:08:08 -07:00
Jan Kara
6e4b0d5692 [PATCH] Copy i_flags to ocfs2 inode flags on write
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ocfs2-specific ip_attr. Hence, when someone sets these flags via a different
interface than ioctl, they are stored correctly.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:07:58 -07:00
Milind Arun Choudhary
5c2c9d383e [PATCH] ocfs2: use __set_current_state()
use __set_current_state(TASK_*) instead of current->state = TASK_*, in
fs/ocfs2

Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:07:50 -07:00
Joel Becker
ee19a77956 ocfs2: Wrap access of directory allocations with ip_alloc_sem.
OCFS2_I(inode)->ip_alloc_sem is a read-write semaphore protecting
local concurrent access of ocfs2 inodes.  However, ocfs2 directories were
not taking the semaphore while they accessed or modified the allocation
tree.

ocfs2_extend_dir() needs to take the semaphore in a write mode when it
adds to the allocation.  All other directory users get there via
ocfs2_bread(), which takes the semaphore in read mode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:07:42 -07:00
Adrian Bunk
6cb129f567 [PATCH] fs/ocfs2/: make 3 functions static
This patch makes the following needlessly global functions static:
- aops.c: ocfs2_write_data_page()
- dlmglue.c: ocfs2_dump_meta_lvb_info()
- file.c: ocfs2_set_inode_size()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:07:27 -07:00
Mark Fasheh
586d232b19 ocfs2: Implement compat_ioctl()
We need this to support 32 bit system calls on 64 bit kernels.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02 15:07:16 -07:00
Andi Kleen
2724b6db66 [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems
vfat implements compat handlers for these ioctls, but when they
were executed on other file systems the kernel would still complain
about an unknown compat ioctl.  Just declare them as compatible
and let them be rejected when not needed by the normal path.

This makes wine runs a lot quieter

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:21 +02:00
Andi Kleen
a106009bdf [PATCH] x86-64: Print type and size correctly for unknown compat ioctls
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:21 +02:00
Andi Kleen
9d016dd43b [PATCH] x86-64: Shut up 32bit emulation for SIOCGIFCOUNT
The kernel doesn't implement it, but some programs like java use it
anyways. Shut the code up.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen
421f028100 [PATCH] x86-64: Define IGNORE_IOCTL() macro for compat_ioctls
Define a new IGNORE_IOCTL() to let a compat ioctl not be warned about even when
it is not implemented.

This is the same as COMPATIBLE_IOCTL internally, but better self documentng.

Valid reasons to use this:
- It is implemented with ->compat_ioctl on some device, but programs
  call it on others too.
- The ioctl is not implemented in the native kernel, but programs
  call it commonly anyways.
Most other reasons are not valid.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Ian Campbell
79e030114a [PATCH] i386: Allow i386 crash kernels to handle x86_64 dumps
The specific case I am encountering is kdump under Xen with a 64 bit
hypervisor and 32 bit kernel/userspace.  The dump created is 64 bit due to
the hypervisor but the dump kernel is 32 bit for maximum compatibility.

It's possibly less likely to be useful in a purely native scenario but I
see no reason to disallow it.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Horms <horms@verge.net.au>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02 19:27:09 +02:00
Jason Uhlenkott
a19b89cad5 NFS: Clean up nfs_create_request comments
Remove some stale comments about hard limits which went away in 2.5.

Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-02 07:37:29 -07:00
J. Bruce Fields
08efa202eb NFS4: invalidate cached acl on setacl
The ACL that the server sets may not be exactly the one we set--for
example, it may silently turn off bits that it does not support.  So we
should remove any cached ACL so that any subsequent request for the ACL
will go to the server.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-02 07:36:09 -07:00
David Woodhouse
7c96b7a146 [JFFS2] Remove dead file histo_mips.h
Its contents were subsumed into compr_rubin.c in a previous
commit, but I forgot to git-rm it.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-05-02 08:36:21 +01:00
Steven Whitehouse
37fde8ca6c [GFS2] Uncomment sprintf_symbol calling code
Now that the patch from -mm has gone upstream, we can uncomment the code
in GFS2 which uses sprintf_symbol.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Robert Peterson <rpeterso@redhat.com>
2007-05-01 09:51:39 +01:00
David Teigland
617e82e10c [DLM] lowcomms style
Replace some printk with log_print, and fix some simple cases of lines
over 80.  Also, return -ENOTCONN if lowcomms_start fails due to no local
IP address being available.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:51 +01:00
akpm@linux-foundation.org
f391a4ead6 [GFS2] printk warning fixes
alpha:

fs/gfs2/dir.c: In function 'gfs2_dir_read_leaf':
fs/gfs2/dir.c:1322: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'sector_t'
fs/gfs2/dir.c: In function 'gfs2_dir_read':
fs/gfs2/dir.c:1455: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type '__u64'

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:48 +01:00
Steven Whitehouse
bf126aee6d [GFS2] Patch to fix mmap of stuffed files
If a stuffed file is mmaped and a page fault is generated at some offset
above the initial page, we need to create a zero page to hang the buffer
heads off before we can unstuff the file. This is a fix for bz #236087

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:46 +01:00
Josef Bacik
476c006be0 [GFS2] use lib/parser for parsing mount options
This patch converts the mount option parsing to use the kernels lib/parser stuff
like all of the other filesystems.  I tested this and it works well.  Thank you,

Signed-off-by: Josef Bacik <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:43 +01:00
Patrick Caulfield
30d3a2373f [DLM] Lowcomms nodeid range & initialisation fixes
Fix a few range & initialization bugs in lowcomms.
- max_nodeid is really the highest nodeid encountered, so all loops must include
it in their iterations.
- clean dlm_local_count & connection_idr so we can do a clean restart.
- Remove a spurious BUG_ON

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:41 +01:00
Josef Bacik
2439fe5072 [DLM] Fix dlm_lowcoms_stop hang
When you attempt to release a lockspace in DLM, it will hang trying to down a
semaphore that has already been downed.  The attached patch fixes the problem.

Signed-off-by: Josef Bacik <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Patrick Caulfield <pcaulfie@redhat.com>
2007-05-01 09:11:38 +01:00
David Teigland
7d3c1feb80 [DLM] fix mode munging
There are flags to enable two specialized features in the dlm:
1. CONVDEADLK causes the dlm to resolve conversion deadlocks internally by
   changing the granted mode of locks to NL.
2. ALTPR/ALTCW cause the dlm to change the requested mode of locks to PR
   or CW to grant them if the normal requested mode can't be granted.

GFS direct i/o exercises both of these features, especially when mixed
with buffered i/o.  The dlm has problems with them.

The first problem is on the master node. If it demotes a lock as a part of
converting it, the actual step of converting the lock isn't being done
after the demotion, the lock is just left sitting on the granted queue
with a granted mode of NL.  I think the mistaken assumption was that the
call to grant_pending_locks() would grant it, but that function naturally
doesn't look at locks on the granted queue.

The second problem is on the process node.  If the master either demotes
or gives an altmode, the munging of the gr/rq modes is never done in the
process copy of the lock, leaving the master/process copies out of sync.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:36 +01:00
Robert Peterson
5f8820960c [GFS2] lockdump improvements
The patch below consists of the following changes (in code order):

1. I fixed a minor compiler warning regarding the printing of
   a kernel symbol address.
2. I implemented a suggestion from Dave Teigland that moves
   the debugfs information for gfs2 into a subdirectory so
   we can easily expand our use of debugfs in the future.
   The current code keeps the glock information in:
   /debug/gfs2/<fs>
   With the patch, the new code keeps the glock information in:
   /debug/gfs2/<fs>/glock
   That will allow us to create more debugfs files in the future.
3. This fixes a bug whereby a failed mount attempt causes the
   debugfs file to not be deleted.  Failed mount attempts should
   always clean up after themselves, including deleting the
   debugfs file and/or directory.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:33 +01:00
Steven Whitehouse
bdd19a22f8 [GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks
This patch detects when the number of entries in a leaf block or inode
block (in the case of stuffed directories) is corrupt and informs the
user. It prevents us from running off the end of the array thats been
allocated for the sorting in this case,

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:30 +01:00
Robert Peterson
7a0079d9e3 [GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump)
This is for Bugzilla Bug 236008: Kernel gpf doing cat /debugfs/gfs2/xxx
(lock dump) seen at the "gfs2 summit".  This also fixes the bug that caused
garbage to be printed by the "initialized at" field.  I apologize for the
kludge, but that code will all be ripped out anyway when the official
sprint_symbol function becomes available in the Linux kernel.  I also
changed some formatting so that spaces are replaced by proper tabs.

Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:28 +01:00
Adrian Bunk
8fa1de386f [DLM] fs/dlm/ast.c should #include "ast.h"
Every file should include the headers containing the prototypes for
it's global functions.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:25 +01:00
Patrick Caulfield
6ed7257b46 [DLM] Consolidate transport protocols
This patch consolidates the TCP & SCTP protocols for the DLM into a single file
and makes it switchable at run-time (well, at least before the DLM actually
starts up!)

For RHEL5 this patch requires Neil Horman's patch that expands the in-kernel
socket API but that has already been twice ACKed so it should be OK.

The patch adds a new lowcomms.c file that replaces the existing lowcomms-sctp.c
& lowcomms-tcp.c files.

Signed-off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:23 +01:00
Patrick Caulfield
fc7c44f03d [DLM] Remove redundant assignment
This patch removes a redundant (and incorrect) assignment from compat_output

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:20 +01:00
Steven Whitehouse
a43a49066d [GFS2] Fix bz 234168 (ignoring rgrp flags)
Ths following patch makes GFS2 use the rgrp flags properly. Although
there are also separate flags for both data and metadata as well, I've
not implemented these as there seems little use for them. On the
otherhand, the "noalloc" flag is generally useful for future changes we
might which to make, so this ensures that we interpret it correctly.

In addition I fixed the comment above the function which was incorrect.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:17 +01:00
David Teigland
ce03f12b37 [DLM] change lkid format
A lock id is a uint32 and is used as an opaque reference to the lock.  For
userland apps, the lkid is passed up, through libdlm, as the return value
from a write() on the dlm device.  This created a problem when the high
bit was 1, making the lkid look like an error.  This is fixed by changing
how the lkid is composed.  The low 16 bits identified the hash bucket for
the lock and the high 16 bits were a per-bucket counter (which eventually
hit 0x8000 causing the problem).  These are simply swapped around; the
number of hash table buckets is far below 0x8000, making all lkid's
positive when viewed as signed.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:15 +01:00
David Teigland
72c2be776b [DLM] interface for purge (2/2)
Add code to accept purge commands from userland.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:12 +01:00
David Teigland
8499137d4e [DLM] add orphan purging code (1/2)
Add code for purging orphan locks.  A process can also purge all of its
own non-orphan locks by passing a pid of zero.  Code already exists for
processes to create persistent locks that become orphans when the process
exits, but the complimentary capability for another process to then purge
these orphans has been missing.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:10 +01:00
David Teigland
7e4dac3359 [DLM] split create_message function
This splits the current create_message() function into two parts so that
later patches can call the new lower-level _create_message() function when
they don't have an rsb struct.  No functional change in this patch.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:07 +01:00
Steven Whitehouse
f01963f264 [GFS2] Set drop_count to 0 (off) by default
This sets the drop_count to 0 by default which is a better default
for most people.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:05 +01:00
David Teigland
b9af8a788a [GFS2] use log_error before LM_OUT_ERROR
We always want to see the details of the error returned to gfs, but
log_debug is often turned off, so use log_error (printk).

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:02 +01:00
David Teigland
ef0c2bb05f [DLM] overlapping cancel and unlock
Full cancel and force-unlock support.  In the past, cancel and force-unlock
wouldn't work if there was another operation in progress on the lock.  Now,
both cancel and unlock-force can overlap an operation on a lock, meaning there
may be 2 or 3 operations in progress on a lock in parallel.  This support is
important not only because cancel and force-unlock are explicit operations
that an app can use, but both are used implicitly when a process exits while
holding locks.

Summary of changes:

- add-to and remove-from waiters functions were rewritten to handle situations
  with more than one remote operation outstanding on a lock

- validate_unlock_args detects when an overlapping cancel/unlock-force
  can be sent and when it needs to be delayed until a request/lookup
  reply is received

- processing request/lookup replies detects when cancel/unlock-force
  occured during the op, and carries out the delayed cancel/unlock-force

- manipulation of the "waiters" (remote operation) state of a lock moved under
  the standard rsb mutex that protects all the other lock state

- the two recovery routines related to locks on the waiters list changed
  according to the way lkb's are now locked before accessing waiters state

- waiters recovery detects when lkb's being recovered have overlapping
  cancel/unlock-force, and may not recover such locks

- revert_lock (cancel) returns a value to distinguish cases where it did
  nothing vs cases where it actually did a cancel; the cancel completion ast
  should only be done when cancel did something

- orphaned locks put on new list so they can be found later for purging

- cancel must be called on a lock when making it an orphan

- flag user locks (ENDOFLIFE) at the end of their useful life (to the
  application) so we can return an error for any further cancel/unlock-force

- we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
  either a completion or blocking ast

- clear an unread bast on a lock that's become unlocked

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:11:00 +01:00
Patrick Caulfield
0320672702 [DLM] fix coverity-spotted stupidity
Replacement patch to remove redundant code rather than moving it around.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:57 +01:00
Robert Peterson
04b933f27b [GFS2] Red Hat bz 228540: owner references
In Testing the previously posted and accepted patch for
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540
I uncovered some gfs2 badness.  It turns out that the current
gfs2 code saves off a process pointer when glocks is taken
in both the glock and glock holder structures.  Those
structures will persist in memory long after the process has
ended; pointers to poisoned memory.

This problem isn't caused by the 228540 fix; the new capability
introduced by the fix just uncovered the problem.

I wrote this patch that avoids saving process pointers
and instead saves off the process pid.  Rather than
referencing the bad pointers, it now does process lookups.
There is special code that makes the output nicer for
printing holder information for processes that have ended.

This patch also adds a stub for the new "sprint_symbol"
function that exists in Andrew Morton's -mm patch set, but
won't go into the base kernel until 2.6.22, since it adds
functionality but doesn't fix a bug.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:55 +01:00
Benjamin Marzinski
172e045a7f [GFS2] flush the log if a transaction can't allocate space
This is a fix for bz #208514. When GFS2 frees up space, the freed blocks
aren't available for reuse until the resource group is successfully written
to the ondisk journal. So in rare cases, GFS2 operations will fail, saying
that the filesystem is out of space, when in reality, you are just waiting for
a log flush. For instance, on a 1Gig filesystem, if I continually write 10 Mb
to a file, and then truncate it, after a hundred interations, the write will
fail with -ENOSPC, even though the filesystem is just 1% full.

The attached patch calls a log flush in these cases.  I tested this patch
fairly heavily to check if there were any locking issues that I missed, and
it seems to work just fine. Also, this patch only does the log flush if
get_local_rgrp makes a complete loop of resource groups without skipping
any do to locking issues. The code would be slightly simpler if it just always
did the log flush after the first failed pass, and you could only ever have
to go through the loop twice, instead of up to three times. However, I guessed
that failing to find a rg simply do to locking issues would be common enough
to skip the log flush in that case, but I'm not certain that this is the right
way to go. Either way, I don't suppose this code will be hit all that often.

Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:52 +01:00
Benjamin Marzinski
6883562588 [GFS2] Fix log entry list corruption
When glock_lo_add and rg_lo_add attempt to add an element to the log, they
check to see if has already been added before locking the log. If another
process adds that element to the log in this window between the check and
locking the log, the element will be added to the list twice. This causes
the log element list to become corrupted in such a way that the log element
can never be successfully removed from the list. This patch pulls the
list_empty() check inside the log lock, to remove this window.

Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:50 +01:00
Steven Whitehouse
f35ac346bc [GFS2] Speed up lock_dlm's locking (move sprintf)
The following patch speeds up lock_dlm's locking by moving the sprintf
out from the lock acquisition path and into the lock creation path. This
reduces the amount of CPU time used in acquiring locks by a fair amount.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: David Teigland <teigland@redhat.com>
2007-05-01 09:10:47 +01:00
Patrick Caulfield
254da030df [DLM] Don't delete misc device if lockspace removal fails
Currently if the lockspace removal fails the misc device associated with a
lockspace is left deleted. After that there is no way to access the orphaned
lockspace from userland.

This patch recreates the misc device if th dlm_release_lockspace fails. I
believe this is better than attempting to remove the lockspace first because
that leaves an unattached device lying around. The potential gap in which there
is no access to the lockspace between removing the misc device and recreating it
is acceptable ... after all the application is trying to remove it, and only new
users of the lockspace will be affected.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:44 +01:00
Steven Whitehouse
420d2a1028 [GFS2] Fix a bug on i386 due to evaluation order
Since gcc didn't evaluate the last two terms of the expression in
glock.c:1881 as a constant expression, it resulted in an error on
i386 due to the lack of a 64bit divide instruction. This adds some
brackets to fix the problem.

This was reported by Andrew Morton.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2007-05-01 09:10:42 +01:00
Steven Whitehouse
3b8249f617 [GFS2] Fix bz 224480 and cleanup glock demotion code
This patch prevents the printing of a warning message in cases where
the fs is functioning normally by handing off responsibility for
unlinked, but still open inodes, to another node for eventual deallocation.
Also, there is now an improved system for ensuring that such requests
to other nodes do not get lost. The callback on the iopen lock is
only ever called when i_nlink == 0 and when a node is unable to deallocate
it due to it still being in use on another node. When a node receives
the callback therefore, it knows that i_nlink must be zero, so we mark
it as such (in gfs2_drop_inode) in order that it will then attempt
deallocation of the inode itself.

As an additional benefit, queuing a demote request no longer requires
a memory allocation. This simplifies the code for dealing with gfs2_holders
as it removes one special case.

There are two new fields in struct gfs2_glock. gl_demote_state is the
state which the remote node has requested and gl_demote_time is the
time when the request came in. Both fields are only valid when the
GLF_DEMOTE flag is set in gl_flags.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:39 +01:00
Josef Whiter
1de9139092 [GFS2] Fix bz 231380, unlock page before dequeing glocks in gfs2_commit_write
If we are writing a file, and in the middle of writing the file
another node attempts to get a shared lock on that file (by doing a du for
example) the process doing the writing will hang waiting on lock_page.  The
reason for this is because when we have waiters on a exclusive glock, we will go
through and flush out all dirty pages associated with that inode and release the
lock.  The problem is that when we flush the dirty pages, we could hit a page
that we have locked durring the generic_file_buffered_write part of this
operation.  This patch unlocks the page before we go to dequeue the lock and
locks it immediatly afterwards, since generic_file_buffered_write needs the page
locked when the commit_write is completed.  This patch resolves the problem,
however if somebody sees a better way to do this please don't hesistate to yell.

Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:37 +01:00
Patrick Caulfield
89adc934f3 [DLM] Fix uninitialised variable in receiving
The length of the second element of the kvec array was not initialised before
being added to the first one. This could cause invalid lengths to be passed to
kernel_recvmsg

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:34 +01:00
Josef Whiter
5c7342d894 [GFS2] fix bz 231369, gfs2 will oops if you specify an invalid mount option
If you specify an invalid mount option when trying to mount a gfs2 filesystem,
gfs2 will oops.  The attached patch resolves this problem.

Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:32 +01:00
Robert Peterson
7c52b166c5 [GFS2] Add gfs2_tool lockdump support to gfs2 (bz 228540)
The attached patch resolves bz 228540.  This adds the capability
for gfs2 to dump gfs2 locks through the debugfs file system.
This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from
gfs2 because all the ioctls were stripped out.  Please see the bugzilla
for more history about the fix.  This patch is also attached to the bugzilla
record.

The patch is against Steve Whitehouse's latest nmw git tree kernel
(2.6.21-rc1) and has been tested on system trin-10.

Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01 09:10:29 +01:00
Neil Brown
83672d392f NFS: Fix directory caching problem - with test case and patch.
Try running this script in an NFS mounted directory (Client relatively
recent - 2.6.18 has the problem as does 2.6.20).

------------------------------------------------------
#!/bin/bash
#
# This script will produce the following errormessage from tar:
#
#   tar: newdir/innerdir/innerfile: file changed as we read it

# create dirs
rm -rf nfstest
mkdir -p nfstest/dir/innerdir

# create files (should not be empty)
echo "Hello World!" >nfstest/dir/file
echo "Hello World!" >nfstest/dir/innerdir/innerfile

# problem only happens if we sleep before chmod
sleep 1

# change file modes
chmod -R a+r nfstest

# rename dir
mv nfstest/dir nfstest/newdir

# tar it
tar -cf nfstest/nfstest.tar -C nfstest newdir

# restore old dir name
mv nfstest/newdir nfstest/dir
--------------------------------------------------------

What happens:

The 'chmod -R' does a readdir_plus in each directory and the results
get cached in the page cache.  It then updates the ctime on each file
by one second.  When this happens, the post-op attributes are used to
update the ctime stored on the client to match the value in the kernel.

The 'mv' calls shrink_dcache_parent on the directory tree which
flushes all the dentries (so a new lookup will be required) but
doesn't flush the inodes or pagecache.

The 'tar' does a readdir on each directory, but (in the case of
'innerdir' at least) satisfies it from the pagecache and uses the
READDIRPLUS data to update all the inodes.  In the case of
'innerdir/innerfile', the ctime is out of date.

'tar' then calls 'lstat' on innerdir/innerfile getting an old ctime.
It then opens the file (triggering a GETATTR), reads the content, and
then calls fstat to see if anything has changed.  It finds that ctime
has changed and so complains.

The problem seems to be that the cache readdirplus info is kept around
for too long.

My patch below discards pagecache data for directories when
dentry_iput is called on them.  This effectively removes the symptom
which convinces me that I correctly understand the problem.  However
I'm not convinced that is a proper solution, as there could easily be
other races that trigger the same problem without being affected by
this 'fix'.

One possibility would be to require that readdirplus pagecache data be
only used *once* to instantiate an inode.  Somehow it should then be
invalidated so that if the dentry subsequently disappears, it will
cause a new request to the server to fill in the stat data.

Another possibility is to compare the cache_change_attribute on the
inode with something similar for the readdirplus info and reject the
info from readdirplus if it is too old.

I haven't tried to implement these and would value other opinions
before I do.

Thanks,
NeilBrown


Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:19 -07:00
Neil Brown
1f4eab7e7c NFS: Set meaningful value for fattr->time_start in readdirplus results.
Don't use uninitialsed value for fattr->time_start in readdirplus results.

The 'fattr' structure filled in by nfs3_decode_direct does not get a
value for ->time_start set.
Thus if an entry is for an inode that we already have in cache,
when nfs_readdir_lookup calls nfs_fhget, it will call nfs_refresh_inode
and may update the inode with out-of-date information.

Directories are read a page at a time, so each page could have a
different timestamp that "should" be used to set the time_start for
the fattr for info in that page.  However storing the timestamp per
page is awkward.  (We could stick in the first 4 bytes and only read 4092
bytes, but that is a bigger code change than I am interested it).

This patch ignores the readdir_plus attributes if a readdir finds the
information already in cache, and otherwise sets ->time_start to the time
the readdir request was sent to the server.

It might be nice to store - in the directory inode - the time stamp for
the earliest readdir request that is still in the page cache, so that we
don't ignore attribute data that we don't have to.  This patch doesn't do
that.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:18 -07:00
Steve Dickson
74dd34e6e8 NFS: Added support to turn off the NFSv3 READDIRPLUS RPC.
READDIRPLUS can be a performance hindrance when the client is working with
large directories. In addition, some servers still have bugs in their
implementations (e.g. Tru64 returns wrong values for the fsid).

Add a mount flag to enable users to turn it off at mount time following the
implementation in Apple's NFS client.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:16 -07:00
Chuck Lever
00a6e7bbf9 SUNRPC: RPC client should retry with different versions of rpcbind
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:16 -07:00
Chuck Lever
df8b172a88 NFS: switch NFSROOT to use new rpcbind client
It is arguable whether NFSROOT will support IPv6, and thus whether
rpcb_getport_external needs to support rpcbind versions greater than 2.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:14 -07:00
Chuck Lever
2bea90d43a SUNRPC: RPC buffer size estimates are too large
The RPC buffer size estimation logic in net/sunrpc/clnt.c always
significantly overestimates the requirements for the buffer size.
A little instrumentation demonstrated that in fact rpc_malloc was never
allocating the buffer from the mempool, but almost always called kmalloc.

To compute the size of the RPC buffer more precisely, split p_bufsiz into
two fields; one for the argument size, and one for the result size.

Then, compute the sum of the exact call and reply header sizes, and split
the RPC buffer precisely between the two.  That should keep almost all RPC
buffers within the 2KiB buffer mempool limit.

And, we can finally be rid of RPC_SLACK_SPACE!

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:10 -07:00
Chuck Lever
511d2e8855 NLM: Shrink the maximum request size of NLM4 requests
NLM version 4 requests estimate the call and reply header sizes rather
conservatively, using the very maximum size allowed in the protocol even
though Linux always uses only a small fraction of the allowable space.

Reduce the size of caller and lock arguments to conserve RPC buffer space
while XDR encoding NLM4 arguments.  Add compile-time checks to ensure the
hostname string won't overflow NLM protocol maximums.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:09 -07:00
Trond Myklebust
ca52fec152 NFS: Use pgoff_t in structures and functions that pass page cache offsets
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:09 -07:00
Trond Myklebust
724c439c20 NFS: Clean up nfs_sync_mapping_wait()
It has no business touching wbc->pages_skipped.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:08 -07:00
Trond Myklebust
8d5658c949 NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:07 -07:00
Trond Myklebust
c63c7b0513 NFS: Fix a race when doing NFS write coalescing
Currently we do write coalescing in a very inefficient manner: one pass in
generic_writepages() in order to lock the pages for writing, then one pass
in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather
the locked pages for coalescing into RPC requests of size "wsize".

In fact, it turns out there is actually a deadlock possible here since we
only start I/O on the second pass. If the user signals the process while
we're in nfs_sync_mapping_wait(), for instance, then we may exit before
starting I/O on all the requests that have been queued up.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:06 -07:00
Trond Myklebust
8b09bee308 NFS: Cleanup for nfs_readpages()
Do the coalescing of read requests into block sized requests at start of
I/O as we scan through the pages instead of going through a second pass.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:05 -07:00
Trond Myklebust
bcb71bba7e NFS: Another cleanup of the read/write request coalescing code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:04 -07:00
Trond Myklebust
d8a5ad75cc NFS: Cleanup the coalescing code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:04 -07:00
Trond Myklebust
91e59c368c NFS: Don't wait for congestion in nfs_update_request()
It is redundant, and will interfere with the call to
balance_dirty_pages_ratelimited_nr in generic_file_write().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:03 -07:00
Amnon Aaronsohn
1a0ba9ae48 NFS: statfs error-handling fix
The nfs statfs function returns a success code on error, and fills the
output buffer with invalid values.  The attached patch makes it return a
correct error code instead.

Signed-off-by: Amnon Aaronsohn <amnonaar@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
 (Modified patch to reinstate the dprintk())
2007-04-30 22:17:02 -07:00
Trond Myklebust
d585158b60 NFS: Fix nfs_set_page_dirty()
Be more careful about testing page->mapping.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:02 -07:00
Jeff Mahoney
1173a729fc reiserfs: suppress lockdep warning
We're getting lockdep warnings due to a post-2.6.21-rc7 bugfix.

The xattr_sem can never be taken in the manner described. Internal inodes
are protected by I_PRIVATE.  Add the appropriate annotation.

Cc: <stable@kernel.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30 16:40:40 -07:00
Steve French
4523cc3044 [CIFS] UID/GID override on CIFS mounts to Samba
When CIFS Unix Extensions are negotiated we get the Unix uid and gid
owners of the file from the server (on the Unix Query Path Info
levels), but if the server's uids don't match the client uid's users
were having to disable the Unix Extensions (which turned off features
they still wanted).   The changeset patch allows users to override uid
and/or gid for file/directory owner with a default uid and/or gid
specified at mount (as is often done when mounting from Linux cifs
client to Windows server).  This changeset also displays the uid
and gid used by default in /proc/mounts (if applicable).

Also cleans up code by adding some of the missing spaces after
"if" keywords per-kernel style guidelines (as suggested by Randy Dunlap
when he reviewed the patch).

Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-04-30 20:13:06 +00:00
Linus Torvalds
cd9bb7e736 Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
  [PATCH] elevator: elv_list_lock does not need irq disabling
  [BLOCK] Don't pin lots of memory in mempools
  cfq-iosched: speedup cic rb lookup
  ll_rw_blk: add io_context private pointer
  cfq-iosched: get rid of cfqq hash
  cfq-iosched: tighten queue request overlap condition
  cfq-iosched: improve sync vs async workloads
  cfq-iosched: never allow an async queue idling
  cfq-iosched: get rid of ->dispatch_slice
  cfq-iosched: don't pass unused preemption variable around
  cfq-iosched: get rid of ->cur_rr and ->cfq_list
  cfq-iosched: slice offset should take ioprio into account
  [PATCH] cfq-iosched: style cleanups and comments
  cfq-iosched: sort IDLE queues into the rbtree
  cfq-iosched: sort RT queues into the rbtree
  [PATCH] cfq-iosched: speed up rbtree handling
  cfq-iosched: rework the whole round-robin list concept
  cfq-iosched: minor updates
  cfq-iosched: development update
  cfq-iosched: improve preemption for cooperating tasks
2007-04-30 08:12:39 -07:00
Jens Axboe
5972511b77 [BLOCK] Don't pin lots of memory in mempools
Currently we scale the mempool sizes depending on memory installed
in the machine, except for the bio pool itself which sits at a fixed
256 entry pre-allocation.

There's really no point in "optimizing" this OOM path, we just need
enough preallocated to make progress. A single unit is enough, lets
scale it down to 2 just to be on the safe side.

This patch saves ~150kb of pinned kernel memory on a 32-bit box.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30 09:08:17 +02:00
Paul Mackerras
49e1900d4c Merge branch 'linux-2.6' into for-2.6.22 2007-04-30 12:38:01 +10:00
Linus Torvalds
42fae7fb1c Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [NET]: Fix networking compilation errors
  [AF_RXRPC/AFS]: Arch-specific fixes.
  [AFS]: Fix VLocation record update wakeup
  [NET]: Revert sk_buff walker cleanups.
2007-04-27 16:20:37 -07:00
Linus Torvalds
f00546363f Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (46 commits)
  [MTD] [MAPS] drivers/mtd/maps/ck804xrom.c: convert pci_module_init()
  [MTD] [NAND] CM-x270 MTD driver
  [MTD] [NAND] Wrong calculation of page number in nand_block_bad()
  [MTD] [MAPS] fix plat-ram printk format
  [JFFS2] Fix compr_rubin.c build after include file elimination.
  [JFFS2] Handle inodes with only a single metadata node with non-zero isize
  [JFFS2] Tidy up licensing/copyright boilerplate.
  [MTD] [OneNAND] Exit loop only when column start with 0
  [MTD] [OneNAND] Fix access the past of the real oobfree array
  [MTD] [OneNAND] Update Samsung OneNAND official URL
  [JFFS2] Better fix for all-zero node headers
  [JFFS2] Improve read_inode memory usage, v2.
  [JFFS2] Improve failure mode if inode checking leaves unchecked space.
  [JFFS2] Fix cross-endian build.
  [MTD] Finish conversion mtd_blkdevs to use the kthread API
  [JFFS2] Obsolete dirent nodes immediately on unlink, where possible.
  Use menuconfig objects: MTD
  [MTD] mtd_blkdevs: Convert to use the kthread API
  [MTD] Fix fwh_lock locking
  [JFFS2] Speed up mount for directly-mapped NOR flash
  ...
2007-04-27 15:34:57 -07:00
David Howells
b1bdb691c3 [AF_RXRPC/AFS]: Arch-specific fixes.
Fixes for various arch compilation problems:

 (*) Missing module exports.

 (*) Variable name collision when rxkad and af_rxrpc both built in
     (rxrpc_debug).

 (*) Large constant representation problem (AFS_UUID_TO_UNIX_TIME).

 (*) Configuration dependencies.

 (*) printk() format warnings.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-27 15:28:45 -07:00
David Howells
47051a2152 [AFS]: Fix VLocation record update wakeup
Fix the wakeup transitions after a VLocation record update completes
one way or another.  This builds on Dave Miller's partial fix.

Also move wakeups outside the spinlocked sections.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-27 15:26:30 -07:00
Linus Torvalds
d868772fff Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (46 commits)
  dev_dbg: check dev_dbg() arguments
  drivers/base/attribute_container.c: use mutex instead of binary semaphore
  mod_sysfs_setup() doesn't return errno when kobject_add_dir() failure occurs
  s2ram: add arch irq disable/enable hooks
  define platform wakeup hook, use in pci_enable_wake()
  security: prevent permission checking of file removal via sysfs_remove_group()
  device_schedule_callback() needs a module reference
  s390: cio: Delay uevents for subchannels
  sysfs: bin.c printk fix
  Driver core: use mutex instead of semaphore in DMA pool handler
  driver core: bus_add_driver should return an error if no bus
  debugfs: Add debugfs_create_u64()
  the overdue removal of the mount/umount uevents
  kobject: Comment and warning fixes to kobject.c
  Driver core: warn when userspace writes to the uevent file in a non-supported way
  Driver core: make uevent-environment available in uevent-file
  kobject core: remove rwsem from struct subsystem
  qeth: Remove usage of subsys.rwsem
  PHY: remove rwsem use from phy core
  IEEE1394: remove rwsem use from ieee1394 core
  ...
2007-04-27 12:58:54 -07:00
David Woodhouse
d1da4e50e5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/mtd/Kconfig

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-27 19:16:19 +01:00
James Morris
057f6c019f security: prevent permission checking of file removal via sysfs_remove_group()
Prevent permission checking from being performed when the kernel wants to
unconditionally remove a sysfs group, by introducing an kernel-only variant
of lookup_one_len(), lookup_one_len_kern().

Additionally, as sysfs_remove_group() does not check the return value of
the lookup before using it, a BUG_ON has been added to pinpoint the cause
of any problems potentially caused by this (and as a form of annotation).

Signed-off-by: James Morris <jmorris@namei.org>
Cc: Nagendra Singh Tomar <nagendra_tomar@adaptec.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-04-27 10:57:33 -07:00
Alan Stern
523ded71de device_schedule_callback() needs a module reference
This patch (as896b) fixes an oversight in the design of
device_schedule_callback().  It is necessary to acquire a reference to the
module owning the callback routine, to prevent the module from being
unloaded before the callback can run.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Satyam Sharma <satyam.sharma@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-04-27 10:57:32 -07:00
Andrew Morton
45cd8d8e1e sysfs: bin.c printk fix
fs/sysfs/bin.c: In function 'read':
fs/sysfs/bin.c:77: warning: format '%zd' expects type 'signed size_t', but argument 4 has type 'int'



Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-04-27 10:57:32 -07:00
Michael Ellerman
8447891fe8 debugfs: Add debugfs_create_u64()
I went to use this the other day, only to find it didn't exist.

It's a straight copy of the debugfs u32 code, then s/u32/u64/. A quick
test shows it seems to be working.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-04-27 10:57:31 -07:00
Adrian Bunk
3106d46f51 the overdue removal of the mount/umount uevents
This patch contains the overdue removal of the mount/umount uevents.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-04-27 10:57:31 -07:00
Linus Torvalds
b928ed5618 Merge branch 'for-linus' of git://git.infradead.org/ubi-2.6
* 'for-linus' of git://git.infradead.org/ubi-2.6:
  UBI: remove unused variable
  UBI: add me to MAINTAINERS
  JFFS2: add UBI support
  UBI: Unsorted Block Images
2007-04-27 10:42:35 -07:00
Linus Torvalds
ea6db58f3e Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (27 commits)
  ocfs2: Cache extent records
  ocfs2: Remember rw lock level during direct io
  ocfs2: Fix up i_blocks calculation to know about holes
  ocfs2: Fix extent lookup to return true size of holes
  ocfs2: Read from an unwritten extent returns zeros
  ocfs2: make room for unwritten extents flag
  ocfs2: Use own splice write actor
  ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate()
  [PATCH] Turn do_sync_file_range() into do_sync_mapping_range()
  ocfs2: zero tail of sparse files on truncate
  ocfs2: Teach ocfs2_get_block() about holes
  ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write()
  ocfs2: teach ocfs2_file_aio_write() about sparse files
  ocfs2: Turn off shared writeable mmap for local files systems with holes.
  ocfs2: abstract out allocation locking
  ocfs2: teach extend/truncate about sparse files
  ocfs2: temporarily remove extent map caching
  ocfs2: sparse b-tree support
  ocfs2: small cleanup of ocfs2_request_delete()
  ocfs2: remove unused code
  ...
2007-04-27 10:29:56 -07:00
Artem Bityutskiy
0029da3bf4 JFFS2: add UBI support
This patch make JFFS2 able to work with UBI volumes via the emulated MTD
devices which are directly mapped to these volumes.

Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>
2007-04-27 14:24:08 +03:00
David S. Miller
39bf094930 [AFS]: Eliminate cmpxchg() usage in vlocation code.
cmpxchg() is not available on every processor so can't
be used in generic code.

Replace with spinlock protection on the ->state changes,
wakeups, and wait loops.

Add what appears to be a missing wakeup on transition
to AFS_VL_VALID state in afs_vlocation_updater().

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 20:39:14 -07:00
David S. Miller
ba3e0e1acc [AFS]: Fix u64 printing in debug logging.
Need 'unsigned long long' casts to quiet warnings on
64-bit platforms when using %ll on a u64.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 16:06:22 -07:00
David Howells
260a980317 [AFS]: Add "directory write" support.
Add support for the create, link, symlink, unlink, mkdir, rmdir and
rename VFS operations to the in-kernel AFS filesystem.

Also:

 (1) Fix dentry and inode revalidation.  d_revalidate should only look at
     state of the dentry.  Revalidation of the contents of an inode pointed to
     by a dentry is now separate.

 (2) Fix afs_lookup() to hash negative dentries as well as positive ones.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:59:35 -07:00
David Howells
c35eccb1f6 [AFS]: Implement the CB.InitCallBackState3 operation.
Implement the CB.InitCallBackState3 operation for the fileserver to
call.  This reduces the amount of network traffic because if this op
is aborted, the fileserver will then attempt an CB.InitCallBackState
operation.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:58:49 -07:00
David Howells
b908fe6b2d [AFS]: Add support for the CB.GetCapabilities operation.
Add support for the CB.GetCapabilities operation with which the fileserver can
ask the client for the following information:

 (1) The list of network interfaces it has available as IPv4 address + netmask
     plus the MTUs.

 (2) The client's UUID.

 (3) The extended capabilities of the client, for which the only current one
     is unified error mapping (abort code interpretation).

To support this, the patch adds the following routines to AFS:

 (1) A function to iterate through all the network interfaces using RTNETLINK
     to extract IPv4 addresses and MTUs.

 (2) A function to iterate through all the network interfaces using RTNETLINK
     to pull out the MAC address of the lowest index interface to use in UUID
     construction.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:58:17 -07:00
David Howells
00d3b7a453 [AFS]: Add security support.
Add security support to the AFS filesystem.  Kerberos IV tickets are added as
RxRPC keys are added to the session keyring with the klog program.  open() and
other VFS operations then find this ticket with request_key() and either use
it immediately (eg: mkdir, unlink) or attach it to a file descriptor (open).

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:57:07 -07:00
David Howells
436058a49e [AFS]: Handle multiple mounts of an AFS superblock correctly.
Handle multiple mounts of an AFS superblock correctly, checking to see
whether the superblock is already initialised after calling sget()
rather than just unconditionally stamping all over it.

Also delete the "silent" parameter to afs_fill_super() as it's not
used and can, in any case, be obtained from sb->s_flags.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:56:24 -07:00
David Howells
63b6be55e8 [AF_RXRPC]: Delete the old RxRPC code.
Delete the old RxRPC code as it's now no longer used.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:55:48 -07:00
David Howells
08e0e7c82e [AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.
Make the in-kernel AFS filesystem use AF_RXRPC instead of the old RxRPC code.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:55:03 -07:00
David Howells
ec26815ad8 [AFS]: Clean up the AFS sources
Clean up the AFS sources.

Also remove references to AFS keys.  RxRPC keys are used instead.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:49:28 -07:00
Mark Fasheh
8341897882 ocfs2: Cache extent records
The extent map code was ripped out earlier because of an inability to deal
with holes. This patch adds back a simpler caching scheme requiring far less
code.

Our old extent map caching was designed back when meta data block caching in
Ocfs2 didn't work very well, resulting in many disk reads. These days our
metadata caching is much better, resulting in no un-necessary disk reads. As
a result, extent caching doesn't have to be as fancy, nor does it have to
cache as many extents. Keeping the last 3 extents seen should be sufficient
to give us a small performance boost on some streaming workloads.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:10:40 -07:00
Mark Fasheh
7cdfc3a1c3 ocfs2: Remember rw lock level during direct io
Cluster locking might have been redone because a direct write won't
complete, so this needs to be reflected in the iocb.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:07:45 -07:00
Mark Fasheh
8110b073a9 ocfs2: Fix up i_blocks calculation to know about holes
Older file systems which didn't support holes did a dumb calculation of
i_blocks based on i_size. This is no longer accurate, so fix things up to
take actual allocation into account.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:07:40 -07:00
Mark Fasheh
4f902c3772 ocfs2: Fix extent lookup to return true size of holes
Initially, we had wired things to return a size '1' of holes. Cook up a
small amount of code to find the next extent and calculate the number of
clusters between the virtual offset and the next allocated extent.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:45 -07:00
Mark Fasheh
49cb8d2d49 ocfs2: Read from an unwritten extent returns zeros
Return an optional extent flags field from our lookup functions and wire up
callers to treat unwritten regions as holes for the purpose of returning
zeros to the user.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:41 -07:00
Mark Fasheh
e48edee2d8 ocfs2: make room for unwritten extents flag
Due to the size of our group bitmaps, we'll never have a leaf node extent
record with more than 16 bits worth of clusters. Split e_clusters up so that
leaf nodes can get a flags field where we can mark unwritten extents.
Interior nodes whose length references all the child nodes beneath it can't
split their e_clusters field, so we use a union to preserve sizing there.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:37 -07:00
Mark Fasheh
6af67d8205 ocfs2: Use own splice write actor
We need to fill holes during a splice write. Provide our own splice write
actor which can call ocfs2_file_buffered_write() with a splice-specific
callback.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:34 -07:00
Mark Fasheh
fa41045fcb ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate()
Do this instead of filemap_fdatawrite() - this way we sync only the
range between i_size and the cluster boundary.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:30 -07:00
Mark Fasheh
5b04aa3a64 [PATCH] Turn do_sync_file_range() into do_sync_mapping_range()
do_sync_file_range() accepts a file * from which it takes an address_space to
sync.  Abstract out the bulk of the function into do_sync_mapping_range()
which takes the address_space directly.  This way callers who want to sync an
address_space directly can take advantage of the functionality provided.

do_sync_file_range() is preserved as a small wrapper around
do_sync_mapping_range().

Ocfs2 in particular would like to use this to initiate a sync of a specific
inode range during truncate, where a file * may not be available.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-04-26 15:02:26 -07:00
Mark Fasheh
60b11392f1 ocfs2: zero tail of sparse files on truncate
Since we don't zero on extend anymore, truncate needs to be fixed up to zero
the part of a file between i_size and and end of it's cluster. Otherwise a
subsequent extend could expose bad data.

This introduced a new helper, which can be used in ocfs2_write().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:20 -07:00
Mark Fasheh
25baf2da14 ocfs2: Teach ocfs2_get_block() about holes
ocfs2_get_block() didn't understand sparse files, fix that. Also remove some
code that isn't really useful anymore. We can fix up
ocfs2_direct_IO_get_blocks() at the same time.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:16 -07:00
Mark Fasheh
5069120b72 ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write()
These are no longer used, and can't handle file systems with sparse file
allocation.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:12 -07:00
Mark Fasheh
9517bac6cc ocfs2: teach ocfs2_file_aio_write() about sparse files
Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock()
because allocating writes will require zeroing of pages adjacent to the I/O
for cluster sizes greater than page size.

Implement a custom file write here, which can order page locks for zeroing.
This also has the advantage that cluster locks can easily be ordered outside
of the page locks.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:08 -07:00
Mark Fasheh
89488984ac ocfs2: Turn off shared writeable mmap for local files systems with holes.
This will be turned back on once we can do allocation in ->page_mkwrite().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:02:01 -07:00
Mark Fasheh
abf8b15694 ocfs2: abstract out allocation locking
Right now, file allocation for ocfs2 is done within ocfs2_extend_file(),
which is either called from ->setattr() (for an i_size change), or at the
top of ocfs2_file_aio_write().

Inodes on file systems with sparse file support will want to do their
allocation during the actual write call.

In either case the cluster locking decisions are the same. We abstract out
that code into a new function, ocfs2_lock_allocators() which will be used by
a later patch to enable writing to sparse files.

This also provides a nice cleanup of ocfs2_extend_allocation().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:01:58 -07:00
Mark Fasheh
3a0782d09c ocfs2: teach extend/truncate about sparse files
For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no
longer exists since i_size is not tied to i_clusters. In
ocfs2_extend_file(), we skip the allocation / page zeroing code for file
systems which understand sparse files.

The core truncate code is changed to do a bottom up tree traversal. This
gets abstracted out into it's own function. To make things more readable,
most of the special case handling for in-inode extents from
ocfs2_do_truncate() is also removed.

Though write support for sparse files comes in a later patch, we at least
update ocfs2_prepare_inode_for_write() to skip allocation for sparse files.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:01:56 -07:00
Mark Fasheh
363041a5f7 ocfs2: temporarily remove extent map caching
The code in extent_map.c is not prepared to deal with a subtree being
rotated between lookups. This can happen when filling holes in sparse files.
Instead of a lengthy patch to update the code (which would likely lose the
benefit of caching subtree roots), we remove most of the algorithms and
implement a simple path based lookup. A less ambitious extent caching scheme
will be added in a later patch.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 15:01:31 -07:00
Mark Fasheh
dcd0538ff4 ocfs2: sparse b-tree support
Introduce tree rotations into the b-tree code. This will allow ocfs2 to
support sparse files. Much of the added code is designed to be generic (in
the ocfs2 sense) so that it can later be re-used to implement large
extended attributes.

This patch only adds the rotation code and does minimal updates to callers
of the extent api.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 14:44:03 -07:00
Mark Fasheh
6f16bf655c ocfs2: small cleanup of ocfs2_request_delete()
There are two checks in there (one for inode newness, one for other mounted
nodes) which are unnecessary, so remove them. The DLM will allow the trylock
in either case without any messaging overhead.

Removing these makes ocfs2_request_delete() a one liner function, so just
move the trylock out one level into ocfs2_query_inode_wipe().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 14:40:55 -07:00
Tiger Yang
68e2b740c4 ocfs2: remove unused code
Remove node messaging code that becomes unused with the delete inode vote
removal.

[Removed even more cruft which I spotted during review --Mark]

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 14:40:16 -07:00
Tiger Yang
500086300e ocfs2: Remove delete inode vote
Ocfs2 currently does cluster-wide node messaging to check the open state of
an inode during delete. This patch removes that mechanism in favor of an
inode cluster lock which is taken at shared read when an inode is first read
and dropped in clear_inode(). This allows a deleting node to test the
liveness of an inode by attempting to take an exclusive lock.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 14:39:48 -07:00
Mark Fasheh
a9f5f70739 ocfs2: filter more error prints
We don't want to print anything at all in ocfs2_lookup() when getting an
error from ocfs2_iget() - it could be something as innocuous as a signal
being detected in the dlm.

ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can
return if the inode was deleted on another node.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:39:08 -07:00
Sunil Mushran
bebe6f120b ocfs2: Replace panic() with emergency_restart() when fencing
We have noticed panic() hanging leading us to a situation in which
the node, while otherwise dead, is still disk heartbeating. This
leads to a hung cluster as the other nodes are waiting for this
node to stop disk heartbeating. This situation is only resolved
by power resetting the box.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:39:02 -07:00
Sunil Mushran
5d262cc7dd ocfs2: Silence compiler warnings
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:38:55 -07:00
Mark Fasheh
be9e986b82 ocfs2: Local mounts should skip inode updates
We don't want the extent map and uptodate cache destruction in
ocfs2_meta_lock_update() on a local mount, so skip that.

This fixes several bugs with uptodate being cleared on buffers and extent
maps being corrupted.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:35:21 -07:00
Sunil Mushran
0d01af6e5d ocfs2_dlm: Call cond_resched_lock() once per hash bucket scan
In dlm_migrate_all_locks(), we currently call cond_resched_lock() after
processing each lockres in a hash bucket. Move it outside the loop so as to
call it only after the entire hash bucket has been processed.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:33:11 -07:00
Srinivas Eeda
756a1501dd ocfs2_dlm: fix race in dlm_remaster_locks
There is a possibility that dlm_remaster_locks could overwride node->state
with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the
node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting
stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock
spinlock.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26 13:33:02 -07:00
Steve French
984acfe1cf [CIFS] prefixpath mounts to servers supporting posix paths used wrong slash
Acked-by: Alexander Bokovoy <abokovoy@ru.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-04-26 16:42:50 +00:00
Steve French
deb0420c6f [CIFS] Update cifs version to 1.49
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-04-26 14:35:54 +00:00
Milind Arun Choudhary
3cbb1c8e1a JFS: use __set_current_state()
use __set_current_state(TASK_*) instead of current->state = TASK_*

Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2007-04-26 07:30:29 -05:00
David Woodhouse
ef2e58ea6b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2007-04-26 09:31:28 +01:00
Andrew Morton
f6449f4ece [JFFS2] Fix compr_rubin.c build after include file elimination.
It seems to be silly season lately.

(Oops, test builds are more useful if the file in question is actually
configured on. dwmw2).

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-26 07:27:04 +01:00
Patrick McHardy
af65bdfce9 [NETLINK]: Switch cb_lock spinlock to mutex and allow to override it
Switch cb_lock to mutex and allow netlink kernel users to override it
with a subsystem specific mutex for consistent locking in dump callbacks.
All netlink_dump_start users have been audited not to rely on any
side-effects of the previously used spinlock.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:03 -07:00
Arnaldo Carvalho de Melo
b529ccf279 [NETLINK]: Introduce nlmsg_hdr() helper
For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the
number of direct accesses to skb->data and for consistency with all the other
cast skb member helpers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:34 -07:00
Eric Dumazet
ae40eb1ef3 [NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution
Now network timestamps use ktime_t infrastructure, we can add a new
ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'.
User programs can thus access to nanosecond resolution.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:04 -07:00
David Woodhouse
61c4b23770 [JFFS2] Handle inodes with only a single metadata node with non-zero isize
This should never happen unless there's corruption on the medium and the
actual data nodes go missing. But the failure mode (an oops when we assume
the fragtree isn't empty and go looking for its last node) isn't useful.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-25 17:04:23 +01:00
Dave Kleikamp
3e2221c73c Copy i_flags to jfs inode flags on write
This mirrors Jan Kara's patches for ext3.  This patch makes sure that
changes made to inode->i_flags are reflected on disk for jfs.  It also
moves a call of jfs_set_inode_flags() to be more consistent with where
jfs_get_inode_flags() is called.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
2007-04-25 09:36:20 -05:00
David Woodhouse
c00c310eac [JFFS2] Tidy up licensing/copyright boilerplate.
In particular, remove the bit in the LICENCE file about contacting
Red Hat for alternative arrangements. Their errant IS department broke
that arrangement a long time ago -- the policy of collecting copyright
assignments from contributors came to an end when the plug was pulled on
the servers hosting the project, without notice or reason.

We do still dual-license it for use with eCos, with the GPL+exception
licence approved by the FSF as being GPL-compatible. It's just that nobody
has the right to license it differently.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-25 14:16:47 +01:00
vignesh
eaa33a9ac0 [CIFS] Replace kmalloc/memset combination with kzalloc
Signed-off-by: Vignesh Babu <vignesh.babu@wipro.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-04-25 12:13:48 +00:00
Steve French
5858ae44e2 [CIFS] Add IPv6 support
IPv6 support was started a few years ago in the cifs client, but lacked a
kernel helper function for parsing the ascii form of the ipv6 address. Now
that that is added (and now IPv6 is the default that some OS use now) it
was fairly easy to finish  the cifs ipv6 support.  This  requires that
CIFS_EXPERIMENTAL be enabled and (at least until the mount.cifs module is
modified to use a new ipv6 friendly call instead of gethostbyname) and the
ipv6 address be passed on the mount as "ip=" mount option.

Thanks

Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-04-25 11:59:10 +00:00
Steve French
cbac3cba66 [CIFS] New CIFS POSIX mkdir performance improvement (part 2)
Fix incorrect parsing of return data

Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-04-25 11:46:06 +00:00
Joakim Tjernlund
0dec4c8bc6 [JFFS2] Better fix for all-zero node headers
No need to check for all-zero header since the header cannot
be zero due to other checks.

Replace the all-zero header check in readinode.c with a
check for the magic word.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-25 04:13:06 +01:00
David Woodhouse
df8e96f391 [JFFS2] Improve read_inode memory usage, v2.
We originally used to read every node and allocate a jffs2_tmp_dnode_info
structure for each, before processing them in (reverse) version order
and discarding the ones which are obsoleted by later nodes.

With huge logfiles, this behaviour caused memory problems. For example, a
file involved in OLPC trac #1292 has 1822391 nodes, and would cause the XO
machine to run out of memory during the first stage of read_inode().

Instead of just inserting nodes into a tree in version order as we find
them, we now put them into a tree in order of their offset within the
file, which allows us to immediately discard nodes which are completely
obsoleted.

We don't use a full tree with 'fragments' pointing to the real data
structure, as we do in the normal fragtree. We sort only on the start
address, and add an 'overlapped' flag to the tmp_dnode_info to indicate
that the node in question is (partially) overlapped by another.

When the scan is complete, we start at the end of the file, adding each
node to a real fragtree as before. Where the node is non-overlapped, we
just add it (it doesn't matter that it's not the latest version; there is
no overlap). When the node at the end of the tree _is_ overlapped, we sort
it and all its overlapping nodes into version order and then add them to
the fragtree in that order.

This 'early discard' reduces the peak allocation of tmp_dnode_info
structures from 1.8M to a mere 62872 (3.5%) in the degenerate case
referenced above.

This version of the patch also correctly rememembers the highest node
version# seen for an inode when it's scanned.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-25 03:23:42 +01:00
Jeff Mahoney
9b7f375505 reiserfs: fix xattr root locking/refcount bug
The listxattr() and getxattr() operations are only protected by a read
lock.  As a result, if either of these operations run in parallel, a race
condition exists where the xattr_root will end up being cached twice, which
results in the leaking of a reference and a BUG() on umount.

This patch refactors get_xa_root(), __get_xa_root(), and create_xa_root(),
into one get_xa_root() function that takes the appropriate locking around
the entire critical section.

Reported, diagnosed and tested by Andrea Righi <a.righi@cineca.it>

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Andrea Righi <a.righi@cineca.it>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Edward Shishkin <edward@namesys.com>
Cc: Alex Zarochentsev <zam@namesys.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-24 08:23:09 -07:00
Latchesar Ionkov
c959df9f01 v9fs: don't use primary fid when removing file
v9fs_insert uses v9fs_fid_lookup (which also locks the fid) to get the
primary fid associated with the dentry and destroys the v9fs_fid struct
after removing the file.  If another process called v9fs_fid_lookup on the
same dentry, it may wait undefinitely for the fid's lock (as the struct is
freed).

This patch changes v9fs_remove to use a cloned fid, so the primary fid is
not locked and freed.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Cc: Eric Van Hensbergen <ericvh@hera.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-24 08:23:08 -07:00
Steve French
2dd29d3133 [CIFS] New CIFS POSIX mkdir performance improvement
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-04-23 22:07:35 +00:00
David Woodhouse
44b998e1eb [JFFS2] Improve failure mode if inode checking leaves unchecked space.
We should never find the unchecked size is non-zero after we've finished
checking all inodes. If it happens, used to BUG(), leaving the alloc_sem
held and deadlocking. Instead, just return -ENOSPC after complaining. The
GC thread will die, but read-only operation should be able to continue and
the file system should be unmountable.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-23 12:11:46 +01:00
David Woodhouse
566865a2a4 [JFFS2] Fix cross-endian build.
When compiling a LE-capable JFFS2 on PowerPC, wbuf.c fails to compile:

fs/jffs2/wbuf.c:973: error: braced-group within expression allowed only inside a function
fs/jffs2/wbuf.c:973: error: initializer element is not constant
fs/jffs2/wbuf.c:973: error: (near initialization for ‘oob_cleanmarker.magic’)
fs/jffs2/wbuf.c:974: error: braced-group within expression allowed only inside a function
fs/jffs2/wbuf.c:974: error: initializer element is not constant
fs/jffs2/wbuf.c:974: error: (near initialization for ‘oob_cleanmarker.nodetype’)
fs/jffs2/wbuf.c:975: error: braced-group within expression allowed only inside a function
fs/jffs2/wbuf.c:976: error: initializer element is not constant
fs/jffs2/wbuf.c:976: error: (near initialization for ‘oob_cleanmarker.totlen’)

Provide constant_cpu_to_je{16,32} functions, and use them for initialising the
offending structure.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-23 12:07:17 +01:00
Trond Myklebust
2b82f190c8 NFS: Fix race in nfs_set_page_dirty
Protect nfs_set_page_dirty() against races with nfs_inode_add_request.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:30 -07:00
Trond Myklebust
612c9384fd NFS: Fix the 'desynchronized value of nfs_i.ncommit' error
Redirtying a request that is already marked for commit will screw up the
accounting for NR_UNSTABLE_NFS as well as nfs_i.ncommit.
Ensure that all requests on the commit queue are labelled with the
PG_NEED_COMMIT flag, and avoid moving them onto the dirty list inside
nfs_page_mark_flush().

Also inline nfs_mark_request_dirty() into nfs_page_mark_flush() for
atomicity reasons. Avoid dropping the spinlock until we're done marking the
request in the radix tree and have added it to the ->dirty list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:29 -07:00
Trond Myklebust
6d677e3504 NFS: Don't clear PG_writeback until after we've processed unstable writes
Ensure that we don't release the PG_writeback lock until after the page has
either been redirtied, or queued on the nfs_inode 'commit' list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:29 -07:00
Trond Myklebust
8e821cad12 NFS: clean up the unstable write code
Get rid of the inlined #ifdefs.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:29 -07:00
Joakim Tjernlund
a491486a20 [JFFS2] Obsolete dirent nodes immediately on unlink, where possible.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-20 23:09:28 -04:00
Evgeniy Dushistov
07a0cfec30 ufs proper handling of zero link case
This patch should fix or partly fix this bug:
http://bugzilla.kernel.org/show_bug.cgi?id=8276

The problem is:

- if we see "zero link case" during reading inode operation, we call
  ufs_error(which remount fs readonly), but not "mark" inode as bad (1)

- in readonly case we do not fill some data structures, which are used in
  read and write case (2)

- VFS call ufs_delete_inode if link count is zero (3)

so (1)->(3)->(2) cause oops, this patch should fix such scenario

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Jim Paris <jim@jtan.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-17 16:36:27 -07:00
Alan Cox
c4bbafda70 exec.c: fix coredump to pipe problem and obscure "security hole"
The patch checks for "|" in the pattern not the output and doesn't nail a
pid on to a piped name (as it is a program name not a file)

Also fixes a very very obscure security corner case.  If you happen to have
decided on a core pattern that starts with the program name then the user
can run a program called "|myevilhack" as it stands.  I doubt anyone does
this.

Signed-off-by: Alan Cox <alan@redhat.com>
Confirmed-by: Christopher S. Aker <caker@theshore.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-17 16:36:26 -07:00
Joakim Tjernlund
c2aecda79c [JFFS2] Speed up mount for directly-mapped NOR flash
Remove excessive scanning of empty flash after a clean
marker for users of the point/unpoint method. cfi_cmdset_0001
uses point/unpoint by default iff flash mapping is linear.
The speedup is several orders of magnitude if FS is less than
half full.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-17 14:07:34 -04:00
Artem Bityutskiy
10731f8300 [JFFS2] fix buffer sise calculations in jffs2_get_inode_nodes()
In read inode we have an optimization which prevents one
min. I/O unit (e.g. NAND page) to be read more then once.

Namely, at the beginning we do not know which node type we read,
so we read so we assume we read the directory entry, because it
has the smallest node header. When we read it, we read up to the
next min. I/O unit, just because if later we'll need to read more,
we already have this data.

If it turns out to be that the node is not directory entry, and
we need more data, and we did not read it because it sits in the
next min. I/O unit, we read the whole next (or several next)
min. I/O unit(s). And if it happens to be that we read a data node,
and we've read part of its data, we calculate partial CRC.
So if later we need to check data CRC, we'll only read the rest
of the data from further min. I/O units and continue CRC checking.

This code was a bit messy and buggy. The bug was that it assumed
relatively large min. I/O unit, so that the largest node header
could overlap only one min. I/O unit boundary.

This parch clean-ups the code a bit and fixes this bug.
The patch was not tested on flash with small min. I/O unit, like
NOR-ECC, nut it was tested on NAND with 512 bytes NAND page, so
it at least does not break NAND. It was also tested with mtdram
so it should not break NOR.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-17 14:05:48 -04:00
Adrian Hunter
7f762ab24c [JFFS2] Disable summary after wbuf recovery
After a write error, any data in the write buffer must
be relocated.  This is handled by the jffs2_wbuf_recover
function.  This function does not fix up the erase block
summary information that is collected for writing at the
end of the block, which results in an incorrect summary
(or BUG if the summary was found to be empty).

As the summary is not essential (it is an optimisation),
it may be disabled for the current erase block when this
situation arises.  This patch does that.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-17 13:56:44 -04:00
Adrian Hunter
99c2594f0e [JFFS2] Prevent list corruption when handling write errors
If a write error occurs, the affected block is placed on the
bad_used_list.  In the case that the write error occured
when writing summary data the block was also being placed on
the dirty_list, which caused list corruption and ultimately
a soft lockup in jffs2_mark_node_obsolete. This fixes that.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-17 13:56:23 -04:00
Artem Bityutskiy
b0afbbec49 [JFFS2] fix deadlock on error path
When the MTD driver returns write failure, the following deadlock
occurs:

We are in __jffs2_flush_wbuf(), we hold &c->wbuf_sem. Write failure.
jffs2_wbuf_recover()->jffs2_reserve_space_gc()->jffs2_do_reserve_space()
->jffs2_erase_pending_blocks()->jffs2_flash_read()

and it tries to lock &c->wbuf_sem again. Deadlock.

Reported-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-17 13:53:51 -04:00
Thomas Gleixner
53043002ef [JFFS2] check node crc before doing anything else
Check the node CRC on scan before doing anything else with the node.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-17 18:26:18 +01:00
J. Bruce Fields
c2fa1b8a6c locks: create posix-to-flock helper functions
Factor out a bit of messy code by creating posix-to-flock counterparts
to the existing flock-to-posix helper functions.

Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-04-16 13:40:37 -04:00
J. Bruce Fields
226a998dbf locks: trivial removal of unnecessary parentheses
Remove some unnecessary parentheses.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-04-16 13:40:37 -04:00
Trond Myklebust
eb4cac10d9 NFS: Fix a list corruption problem
We must remove the request from whatever list it is currently on before we
can add it to the dirty list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-15 16:48:11 -07:00
Trond Myklebust
5a6d41b32a NFS: Ensure PG_writeback is cleared when writeback fails
If the writebacks are cancelled via nfs_cancel_dirty_list, or due to the
memory allocation failing in nfs_flush_one/nfs_flush_multi, then we must
ensure that the PG_writeback flag is cleared.

Also ensure that we actually own the PG_writeback flag whenever we
schedule a new writeback by making nfs_set_page_writeback() return the
value of test_set_page_writeback().
The PG_writeback page flag ends up replacing the functionality of the
PG_FLUSHING nfs_page flag, so we rip that out too.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-14 21:46:48 -07:00
Trond Myklebust
60fa3f769f NFS: Fix two bugs in the O_DIRECT write code
Do not flag an error if the COMMIT call fails and we decide to resend the
writes. Let the resend flag the error if it fails.

If a write has failed, then nfs_direct_write_result should not attempt to
send a commit. It should just exit asap and return the error to the user.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-14 21:46:48 -07:00
Trond Myklebust
e1552e1998 NFS: Fix an Oops in nfs_setattr()
It looks like nfs_setattr() and nfs_rename() also need to test whether the
target is a regular file before calling nfs_wb_all()...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-14 21:46:47 -07:00
Jeff Mahoney
c3724b129b [PATCH] autofs4: fix race in unhashed dentry code
Commit f50b6f8691 introduced a race in
autofs4 between autofs_lookup_unhashed() and autofs_dentry_release().

autofs_dentry_release() ends up clearing the ->dentry and ->inode members
of autofs_info before removing it from the rehash list.  The list is
protected by the rehash lock in both functions, but since
autofs_dentry_release() starts tearing the autofs_info struct down before
removing it from the list, autofs_lookup_unhashed() can get a autofs_info
with a NULL dentry.

This patch moves the clearing of ->dentry and ->inode after the removal
from the rehash list.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-12 15:31:42 -07:00
Vladimir Saveliev
6d205f1205 [PATCH] reiserfs: fix key decrementing
This patch fixes a bug in function decrementing a key of stat data item.

Offset of reiserfs keys are compared as signed values.  To set key offset
to maximal possible value maximal signed value has to be used.

This bug is responsible for severe reiserfs filesystem corruption which
shows itself as warning vs-13060.  reiserfsck fixes this corruption by
filesystem tree rebuilding.

Signed-off-by: Vladimir Saveliev <vs@namesys.com>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-12 15:31:42 -07:00
Stephen Rothwell
1a38147ed0 [POWERPC] Make struct property's value a void *
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13 03:55:18 +10:00
Timo Savola
a5bfffac64 [PATCH] fuse: validate rootmode mount option
If rootmode isn't valid, we hit the BUG() in fuse_init_inode.  Now
EINVAL is returned.

Signed-off-by: Timo Savola <tsavola@movial.fi>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-08 19:47:55 -07:00
Steve French
5268df2ead [CIFS] Add write perm for usr to file on windows should remove r/o dos attr
Remove read only dos attribute on chmod when adding any write permission (ie on any of
user/group/other (not all of user/group/other ie  0222) when
mounted to windows.

Suggested by: Urs Fleisch

Signed-off-by: Urs Fleisch <urs.fleisch@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-04-06 19:28:16 +00:00
Andrew Morton
2363cc0264 [PATCH] remove protection of LANANA-reserved majors
Revert all this.  It can cause device-mapper to receive a different major from
earlier kernels and it turns out that the Amanda backup program (via GNU tar,
apparently) checks major numbers on files when performing incremental backups.

Which is a bit broken of Amanda (or tar), but this feature isn't important
enough to justify the churn.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-04 21:12:47 -07:00
Steve French
3a9f462f6d [CIFS] Remove unnecessary parm to cifs_reopen_file
Also expand debug entry to show which character on a failed Unicode
mapping.

Acked-by: Shaggy <shaggy@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-04-04 17:10:24 +00:00
Igor Mammedov
aaf737adb6 [CIFS] Switch cifsd to kthread_run from kernel_thread
cifsd was the only cifs thread that had not been switched to the newer
kthread interface

Signed-off-by: Igor Mammedov <niallain at gmail.com>
Signed-off-by: Wilhelm Meier <wilhelm.meier@fh-kl.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-04-03 19:16:43 +00:00
Christoph Hellwig
c33f8d3274 [CIFS] Remove unnecessary checks
file->f_path.dentry or file->f_path.dentry.d_inode can't be NULL since at
least ten years, similar for all but very few arguments passed in from the
VFS.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-04-02 18:47:20 +00:00
Robert P. J. Day
8dc64fca75 [JFFS2] Delete everything related to obsolete JFFS2_PROC option
Delete everything related to the apparently non-existent kernel config
option JFFS2_PROC.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-04-02 14:11:25 -04:00
Andrew Morton
7479d2b90b [PATCH] revert "retries in ext4_prepare_write() violate ordering requirements"
Revert b46be05004.  Same reasoning as for ext3.

Cc: Kirill Korotaev <dev@openvz.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ken Chen <kenneth.w.chen@intel.com>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dmitriy Monakhov <dmonakhov@openvz.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-02 10:06:08 -07:00
Andrew Morton
1aa9b4b9bc [PATCH] revert "retries in ext3_prepare_write() violate ordering requirements"
Revert e92a4d595b.

Dmitry points out

"When we block_prepare_write() failed while ext3_prepare_write() we jump to
 "failure" label and call ext3_prepare_failure() witch search last mapped bh
 and invoke commit_write untill it.  This is wrong!!  because some bh from
 begining to the last mapped bh may be not uptodate.  As a result we commit to
 disk not uptodate page content witch contains garbage from previous usage."

and

"Unexpected file size increasing."

   Call trace the same as it was in first issue but result is different.
   For example we have file with i_size is zero.  we want write two blocks ,
   but fs has only one free block.

   ->ext3_prepare_write(...from == 0, to == 2048)
     retry:
     ->block_prepare_write() == -ENOSPC# we failed but allocated one block here.
     ->ext3_prepare_failure()
       ->commit_write( from == 0, to == 1024) # after this i_size becomes 1024 :)
     if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
        goto retry;

   Finally when all retries will be spended ext3_prepare_failure return
   -ENOSPC, but i_size was increased and later block trimm procedures can't
   help here.

We don't appear to have the horsepower to fix these issues, so let's put
things back the way they were for now.

Cc: Kirill Korotaev <dev@openvz.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ken Chen <kenneth.w.chen@intel.com>
Cc: Andrey Savochkin <saw@sw.ru>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dmitriy Monakhov <dmonakhov@openvz.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-02 10:06:08 -07:00
Brian Pomerantz
0322170260 [PATCH] fix page leak during core dump
When the dump cannot occur most likely because of a full file system and
the page to be written is the zero page, the call to page_cache_release()
is missed.

Signed-off-by: Brian Pomerantz <bapper@mvista.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-02 10:06:08 -07:00
Andrew Morton
05565b65a5 [PATCH] proc: fix linkage with CONFIG_SYSCTL=y, CONFIG_PROC_SYSCTL=n
We're using #ifdef CONFIG_SYSCTL, but we should be using CONFIG_PROC_SYSCTL,
so we get

 fs/built-in.o: In function `proc_root_init':
 /usr/src/linux/fs/proc/root.c:83: undefined reference to `proc_sys_init'

Fix that up and remove an ifdef-in-C.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Helge Hafting <helgehaf@aitel.hist.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-02 10:06:08 -07:00
Linus Torvalds
22c8c65d24 Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
  [PATCH] splice: partial write fix
2007-03-29 08:23:52 -07:00
Paolo 'Blaisorblade' Giarrusso
75e8defbe4 [PATCH] uml: hostfs variable renaming
* rename name to host_root_path
* rename data to req_root.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-29 08:22:25 -07:00
Jeff Dike
622e696938 [PATCH] uml: fix compilation problems
Fix a few miscellaneous compilation problems -
	an assignment with mismatched types in ldt.c
	a missing include in mconsole.h which needs a definition of uml_pt_regs
	I missed removing an include of user_util.h in hostfs

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-29 08:22:25 -07:00
Dmitriy Monakhov
d9993c37ef [PATCH] splice: partial write fix
Currently if partial write has happened while ->commit_write() then page
wasn't marked as accessed and rebalanced.

Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-03-29 14:26:42 +02:00
Linus Torvalds
e5c465f5d9 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2_dlm: Check for migrateable lockres in dlm_empty_lockres()
  ocfs2_dlm: Fix lockres ref counting bug
2007-03-28 14:02:03 -07:00
Jeff Garzik
a9c87a10db Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes 2007-03-28 02:21:18 -04:00
Zach Brown
28defbea64 [PATCH] aio: remove bare user-triggerable error printk
The user can generate console output if they cause do_mmap() to fail
during sys_io_setup().  This was seen in a regression test that does
exactly that by spinning calling mmap() until it gets -ENOMEM before
calling io_setup().

We don't need this printk at all, just remove it.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 17:53:25 -07:00
Jean Tourrilhes
ed4bb10631 [PATCH] wext: Add missing ioctls to 64<->32 conversion
Johannes Berg and Michael Buesch noticed that the WPA ioctls
were missing from the 64<->32 bit conversion. This means that when
using a 32 bits userspace on a 64 bit kernel, those ioctls fail.

Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-03-27 14:10:17 -04:00
Linus Torvalds
e0ab0bb6d2 Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
  Export __splice_from_pipe()
  2/2 splice: dont readpage
  1/2 splice: dont steal
  make elv_register() output atomic
  block: blk_max_pfn is somtimes wrong
2007-03-27 09:05:49 -07:00
Mika Kukkonen
5c46010af2 [PATCH] Fix kernel build with EMBEDDED & PROC_FS & !PROC_SYSCTL
Without attached patch against current -git I get following with
!PROC_SYSCTL (with EMBEDDED and PROC_FS set):

    CC      init/version.o
    LD      init/built-in.o
    LD      vmlinux
  fs/built-in.o: In function `do_proc_sys_lookup':
  proc_sysctl.c:(.text+0x26583): undefined reference to `sysctl_head_next'
  fs/built-in.o: In function `proc_sys_revalidate':
  proc_sysctl.c:(.text+0x265bb): undefined reference to `sysctl_head_finish'
  fs/built-in.o: In function `proc_sys_readdir':
  proc_sysctl.c:(.text+0x26720): undefined reference to `sysctl_head_next'
  proc_sysctl.c:(.text+0x267d8): undefined reference to `sysctl_head_finish'
  proc_sysctl.c:(.text+0x268e7): undefined reference to `sysctl_head_next'
  proc_sysctl.c:(.text+0x26910): undefined reference to `sysctl_head_finish'
  fs/built-in.o: In function `proc_sys_write':
  proc_sysctl.c:(.text+0x2695d): undefined reference to `sysctl_perm'
  proc_sysctl.c:(.text+0x2699c): undefined reference to `sysctl_head_finish'
  fs/built-in.o: In function `proc_sys_read':
  proc_sysctl.c:(.text+0x269e9): undefined reference to `sysctl_perm'
  proc_sysctl.c:(.text+0x26a25): undefined reference to `sysctl_head_finish'
  fs/built-in.o: In function `proc_sys_permission':
  proc_sysctl.c:(.text+0x26ad1): undefined reference to `sysctl_perm'
  proc_sysctl.c:(.text+0x26adb): undefined reference to `sysctl_head_finish'
  fs/built-in.o: In function `proc_sys_lookup':
  proc_sysctl.c:(.text+0x26b39): undefined reference to `sysctl_head_finish'
  make: *** [vmlinux] Virhe 1

All those functions are in fs/proc/proc_sysctl.c, which has no CONFIG_
#define's in it, so the patch makes the compilation of that file to depend
on CONFIG_PROC_SYSCTL (the simplest choice).

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:16 -07:00
J. Bruce Fields
79f6523a16 [PATCH] knfsd: nfsd4: remove superfluous cancel_delayed_work() call
This cancel_delayed_work call is called from a function that is only called
from a piece of code that immediate follows a cancel and destruction of the
workqueue, so it's clearly a mistake.

Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-27 09:05:14 -07:00