Commit Graph

6640 Commits

Author SHA1 Message Date
Ingo Molnar
2d72376b3a sched: clean up schedstats, cnt -> count
rename all 'cnt' fields and variables to the less yucky 'count' name.

yuckage noticed by Andrew Morton.

no change in code, other than the /proc/sched_debug bkl_count string got
a bit larger:

   text    data     bss     dec     hex filename
  38236    3506      24   41766    a326 sched.o.before
  38240    3506      24   41770    a32a sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:12 +02:00
Al Viro
5ba253313d more low-hanging fruits - kernel, fs, lib signedness
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14 12:41:52 -07:00
Al Viro
b1519d047c fs/partitions/sun.c endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-14 12:41:51 -07:00
Peter Zijlstra
14358e6dda lockdep: annotate dir vs file i_mutex
On Mon, 2007-09-24 at 22:13 -0400, Steven Rostedt wrote:
> The circular lock seems to be this:
> 
> #1:
> 
>   sys_mmap2:              down_write(&mm->mmap_sem);
>   nfs_revalidate_mapping: mutex_lock(&inode->i_mutex);
> 
> 
> #0:
> 
>   vfs_readdir:     mutex_lock(&inode->i_mutex);
>    - during the readdir (filldir64), we take a user fault (missing page?)
>     and call do_page_fault -
>   do_page_fault:   down_read(&mm->mmap_sem);
> 
> 
> So it does indeed look like a circular locking. Now the question is, "is
> this a bug?".  Looking like the inode of #1 must be a file or something
> else that you can mmap and the inode of #0 seems it must be a directory.
> I would say "no".
> 
> Now if you can readdir on a file or mmap a directory, then this could be
> an issue.
> 
> Otherwise, I'd love to see someone teach lockdep about this issue! ;-)

Make a distinction between file and dir usage of i_mutex.
The inode should be complete and unused at unlock_new_inode(), re-init
i_mutex depending on its type.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-14 01:38:33 +02:00
Peter Zijlstra
d475fd428c lockdep: per filesystem inode lock class
Give each filesystem its own inode lock class. The various filesystems have
different locking order wrt the inode locks; esp. the pseudo filesystems differ
from the rest.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15 14:51:31 +02:00
Dave Kleikamp
8d8fe64237 JFS: Bio cleanup: Replace missing return statements
commit e30408b2a9 ("JFS: fix bio-related
build breakage") removed some "return 0;" statements, rather than
changing them to null returns.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-13 11:14:04 -07:00
David Woodhouse
ebf8889bd1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2007-10-13 14:58:23 +01:00
David Woodhouse
b160292cc2 Merge Linux 2.6.23 2007-10-13 14:43:54 +01:00
David Woodhouse
4fc8a60786 [JFFS2] Remove stray debugging printk
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-10-13 14:29:39 +01:00
David Woodhouse
b534e70cf5 [JFFS2] Handle dirents on the flash with embedded zero bytes in names.
In three places: summary scan, normal scan, REF_PRISTINE GC.

Just truncate at the NUL, since that was the correct thing to do in the
only case where this (inexplicable) breakage has been seen.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-10-13 11:35:58 +01:00
David Woodhouse
69ca4378aa [JFFS2] Check for creation of dirents with embedded zero bytes in name.
I have no idea how this happened, but OLPC trac #4184 suggests that it
did. Catch it early.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-10-13 11:33:50 +01:00
David Woodhouse
a8c68f3264 [JFFS2] Don't count all 'very dirty' blocks except in debug mode
... where we'll actually print the count in a debug message.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-10-13 11:32:16 +01:00
David Woodhouse
2665ea842d [JFFS2] Check whether garbage-collection actually obsoleted its victim.
In OLPC trac #4184 we found a case where a corrupted node didn't
actually get obsoleted when we tried to garbage-collect it. So we wrote
out many million copies of it, in repeated attempts to obsolete it,
until the flash became full. Don't Do That.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-10-13 11:31:23 +01:00
David Woodhouse
85becc535b [JFFS2] Relax threshold for triggering GC due to dirty blocks.
Instead of matching resv_blocks_gcmerge, which is only about 3, instead
match resv_blocks_gctrigger, which includes a proportion of the total
device size.

These ought to become tunable from userspace, at some point.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-10-13 11:29:07 +01:00
Linus Torvalds
efefc6eb38 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (75 commits)
  PM: merge device power-management source files
  sysfs: add copyrights
  kobject: update the copyrights
  kset: add some kerneldoc to help describe what these strange things are
  Driver core: rename ktype_edd and ktype_efivar
  Driver core: rename ktype_driver
  Driver core: rename ktype_device
  Driver core: rename ktype_class
  driver core: remove subsystem_init()
  sysfs: move sysfs file poll implementation to sysfs_open_dirent
  sysfs: implement sysfs_open_dirent
  sysfs: move sysfs_dirent->s_children into sysfs_dirent->s_dir
  sysfs: make sysfs_root a regular directory dirent
  sysfs: open code sysfs_attach_dentry()
  sysfs: make s_elem an anonymous union
  sysfs: make bin attr open get active reference of parent too
  sysfs: kill unnecessary NULL pointer check in sysfs_release()
  sysfs: kill unnecessary sysfs_get() in open paths
  sysfs: reposition sysfs_dirent->s_mode.
  sysfs: kill sysfs_update_file()
  ...
2007-10-12 15:49:37 -07:00
Linus Torvalds
a6e3d7dba9 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (23 commits)
  ocfs2: Optionally return filldir errors
  ocfs2: Write support for directories with inline data
  ocfs2: Read support for directories with inline data
  ocfs2: Write support for inline data
  ocfs2: Read support for inline data
  ocfs2: Structure updates for inline data
  ocfs2: Cleanup dirent size check
  ocfs2: Rename cleanups
  ocfs2: Provide convenience function for ino lookup
  ocfs2: Implement ocfs2_empty_dir() as a caller of ocfs2_dir_foreach()
  ocfs2: Remove open coded readdir()
  ocfs2: Pass raw u64 to filldir
  ocfs2: Abstract out core dir listing functionality
  ocfs2: Move directory manipulation code into dir.c
  ocfs2: Small refactor of truncate zeroing code
  ocfs2: move nonsparse hole-filling into ocfs2_write_begin()
  ocfs2: Sync ocfs2_fs.h with ocfs2-tools
  [PATCH] fs/ocfs2/: removed unneeded initial value and function's return value
  ocfs2: Implement show_options()
  ocfs2: Clear slot map when umounting a local volume
  ...
2007-10-12 15:04:00 -07:00
Tejun Heo
6d66f5cd26 sysfs: add copyrights
Sysfs has gone through considerable amount of reimplementation.  Add
copyrights.  Any objections?  :-)

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:12 -07:00
Tejun Heo
a4e8b91254 sysfs: move sysfs file poll implementation to sysfs_open_dirent
Sysfs file poll implementation is scattered over sysfs and kobject.
Event numbering is done in sysfs_dirent but wait itself is done on
kobject.  This not only unecessarily bloats both kobject and
sysfs_dirent but is also buggy - if a sysfs_dirent is removed while
there still are pollers, the associaton betwen the kobject and
sysfs_dirent breaks and kobject may be freed with the pollers still
sleeping on it.

This patch moves whole poll implementation into sysfs_open_dirent.
Each time a sysfs_open_dirent is created, event number restarts from 1
and pollers sleep on sysfs_open_dirent.  As event sequence number is
meaningless without any open file and pollers should have open file
and thus sysfs_open_dirent, this ephemeral event counting works and is
a saner implementation.

This patch fixes the dnagling sleepers bug and reduces the sizes of
kobject and sysfs_dirent by one pointer.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:11 -07:00
Tejun Heo
85a4ffad3d sysfs: implement sysfs_open_dirent
Implement sysfs_open_dirent which represents an open file (attribute)
sysfs_dirent.  A file sysfs_dirent with one or more open files have
one sysfs_dirent and all sysfs_buffers (one for each open instance)
are linked to it.

sysfs_open_dirent doesn't actually do anything yet but will be used to
off-load things which are specific for open file sysfs_dirent from it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:11 -07:00
Tejun Heo
bc747f37a0 sysfs: move sysfs_dirent->s_children into sysfs_dirent->s_dir
Children list head is only meaninful for directory nodes.  Move it
into s_dir.  This doesn't save any space currently but it will with
further changes.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:11 -07:00
Tejun Heo
dc2f75f0e0 sysfs: make sysfs_root a regular directory dirent
sysfs_root is different from a regular directory dirent in that it's
of type SYSFS_ROOT and doesn't have a name.  These differences aren't
used by anybody and only adds to complexity.  Make sysfs_root a
regular directory dirent.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:11 -07:00
Tejun Heo
d6b4fd2fae sysfs: open code sysfs_attach_dentry()
sysfs_attach_dentry() now has only one caller and isn't doing much
other than obfuscating the code.  Open code and kill it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:11 -07:00
Tejun Heo
b1fc3d6144 sysfs: make s_elem an anonymous union
Make s_elem an anonymous union.  Prefixing with s_elem makes things
needlessly longer without any advantage.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:10 -07:00
Tejun Heo
078ce6409c sysfs: make bin attr open get active reference of parent too
All bin attr operations require active references of itself and its
parent.  There's no reason to allow open when its parent has been
deactivated and allowing it is inconsistent with regular sysfs file.
Use sysfs_get_active_two() in bin attribute open function.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:10 -07:00
Tejun Heo
50ab1a7286 sysfs: kill unnecessary NULL pointer check in sysfs_release()
In sysfs_release(), sysfs_buffer pointed to by filp->private_data is
guaranteed to exist.  Kill the unnecessary NULL check.  This also
makes the code more consistent with the counterpart in fs/sysfs/bin.c.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:10 -07:00
Tejun Heo
b05f0548da sysfs: kill unnecessary sysfs_get() in open paths
There's no reason to get an extra reference to sysfs_dirent for an
open file.  Open file has a reference to the dentry which in turn has
a reference to sysfs_dirent.  This is fairly obvious as otherwise open
itself won't be able to access the sysfs_dirent.  Kill the extra
sysfs_get() and matching sysfs_put().

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:10 -07:00
Tejun Heo
b13dc89c5a sysfs: reposition sysfs_dirent->s_mode.
Move s_mode downward such that it's side-by-side with s_iattr which is
used for the same thing.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:10 -07:00
Tejun Heo
5a7ad7f044 sysfs: kill sysfs_update_file()
sysfs_update_file() depends on inode->i_mtime but sysfs iondes are now
reclaimable making the reported modification time unreliable.  There's
only one user (pci hotplug) of this notification mechanism and it
reportedly isn't utilized from userland.

Kill sysfs_update_file().

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:09 -07:00
Tejun Heo
59f6901568 sysfs: clean up header files
sysfs is about to go through major overhaul making this a pretty good
opportunity to clean up (out-of-tree changes and pending patches will
need regeneration anyway).  Clean up headers.

* Kill space between * and symbolname.

* Move SYSFS_* type constants and flags into fs/sysfs/sysfs.h.
  They're internal to sysfs.

* Reformat function prototypes and add argument symbol names.

* Make dummy function definition order match that of function
  prototypes.

* Add some comments.

* Reorganize fs/sysfs/sysfs.h according to which file the declared
  variable or feature lives in.

This patch does not introduce any behavior change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:09 -07:00
Tejun Heo
f88123eaf9 sysfs: fix sysfs_chmod_file() such that it updates sd->s_mode too
sysfs_chmod_file() looked and updated only inode of the target file.
Dentry and inode are reclaimable and the update mode data will go away
when the inode is reclaimed.  This patch makes sysfs_chmod_file()
update sd->s_mode too such that the change is permanent.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:09 -07:00
Tejun Heo
181b2e4be1 sysfs: fix comments of sysfs_add/remove_one()
sysfs_add/remove_one() now link and unlink the target dirent into and
from the children list.  Update comments accordingly.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:09 -07:00
Greg Kroah-Hartman
5c3e8964ce sysfs: spit a warning to users when they try to create a duplicate sysfs file
We want to let people know when we create a duplicate sysfs file, as
they need to fix up their code.


Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:09 -07:00
Eric W. Biederman
45aaae9c51 sysfs: Rewrite sysfs_move_dir in terms of sysfs dirents
This patch rewrites sysfs_move_dir to perform it's checks
as much as possible on the underlying sysfs_dirents instead
of the contents of the dcache, making sysfs_move_dir
more like the rest of the sysfs directory modification
code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:08 -07:00
Eric W. Biederman
9918f9a481 sysfs: Rewrite rename in terms of sysfs dirents
This patch rewrites sysfs_rename_dir to perform it's checks
as much as possible on the underlying sysfs_dirents instead
of the contents of the dcache.  It turns out that this version
is a little simpler, and a little more like the rest of
the sysfs directory modification code.

tj: fixed double locking of sysfs_mutex

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:08 -07:00
Eric W. Biederman
5a26b79c42 sysfs: Remove s_dentry
The only uses of s_dentry left are the code that maintains
s_dentry and trivial users that don't actually need it.
So this patch removes the s_dentry maintenance code and
restructures the trivial uses to use something else.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:08 -07:00
Tejun Heo
e0712bbfd9 sysfs: simply sysfs_get_dentry
Now that we know the sysfs tree structure cannot change under us and
sysfs shadow support is dropped, sysfs_get_dentry() can be simplified
greatly.  It can just look up from the root and there's no need to
retry on failure.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:08 -07:00
Eric W. Biederman
932ea2e374 sysfs: Introduce sysfs_rename_mutex
Looking carefully at the rename code we have a subtle dependency
that the structure of sysfs not change while we are performing
a rename.  If the parent directory of the object we are renaming
changes while the rename is being performed nasty things could
happen when we go to release our locks.

So introduce a sysfs_rename_mutex to prevent this highly
unlikely theoretical issue.

In addition hold sysfs_rename_mutex over all calls to
sysfs_get_dentry. Allowing sysfs_get_dentry to be simplified
in the future.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:08 -07:00
Eric W. Biederman
89bec09705 sysfs: Rewrite sysfs_drop_dentry.
Currently we find the dentry to drop by looking at sd->s_dentry.
We can just as easily accomplish the same task by looking up the
sysfs inode and finding all of the dentries from there, with the
added bonus that we don't need to play with the sysfs_assoc_lock.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:08 -07:00
Eric W. Biederman
3efa65b92d sysfs: Simplify readdir.
At some point someone wrote sysfs_readdir to insert a cursor
into the list of sysfs_dirents to ensure that sysfs_readdir would
restart properly.  That works but it is complex code and tends
to be expensive.

The same effect can be achieved by keeping the sysfs_dirents in
inode order and using the inode number as the f_pos.  Then
when we restart we just have to find the first dirent whose inode
number is equal or greater then the last sysfs_dirent we attempted
to return.

Removing the sysfs directory cursor also allows the remove of
all of the mysterious checks for sysfs_type(sd) != 0.   Which
were nonbovious checks to see if a cursor was in a directory list.

tj: offset marker for EOF is changed from UINT_MAX to INT_MAX to avoid
    overflow in case offset is 32bit.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:08 -07:00
Eric W. Biederman
94777e0918 sysfs: In sysfs_lookup don't open code sysfs_find_dirent
This is a small cleanup patch that makes the code just
a little bit cleaner.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:08 -07:00
Eric W. Biederman
7d0c7d676c sysfs: Make sysfs_mount static
This patch modifies the users of sysfs_mount to use sysfs_root
instead (which is what they are looking for).  It then
makes sysfs_mount static to keep people from using it
by accident.

The net result is slightly faster and cleaner code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:07 -07:00
Eric W. Biederman
0333cd8a3f sysfs: Use kill_anon_super
Since sysfs no longer stores fs directory information in the dcache
on a permanent basis kill_litter_super it is inappropriate and actively
wrong.  It will decrement the count on all dentries left in the
dcache before trying to free them.

At the moment this is not biting us only because we never unmount sysfs.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:07 -07:00
Eric W. Biederman
119dd52be3 sysfs: Remove sysfs_instantiate
Now that sysfs_get_inode is dropping the inode lock
we no longer have a need from sysfs_instantiate.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:07 -07:00
Eric W. Biederman
372e88bd19 sysfs: Move all of inode initialization into sysfs_init_inode
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:07 -07:00
Tejun Heo
253280267a sysfs: fix i_mutex locking in sysfs_get_dentry()
lookup_one_len_kern() should be called with the parent's i_mutex
locked.  Fix it.

Spotted by Eric W. Biederman.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:07 -07:00
Rolf Eike Beer
a93720eeb4 sysfs: Fix typos in fs/sysfs/file.c
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:06 -07:00
Tejun Heo
990e53f880 sysfs: make sysfs_addrm_finish() return void
With the previous sysfs_add_one() update, there is only one user of
the return value of sysfs_addrm_finish() and the user can switch to
testing @sd easily.  Make sysfs_addrm_finish() return void for cleaner
semantics as suggested by Satyam Sharma.

This patch doesn't introduce any noticeable behavior change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Satyam Sharma <satyam.sharma@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:04 -07:00
Tejun Heo
23dc279950 sysfs: make sysfs_add_one() automatically check for duplicate entry
Make sysfs_add_one() check for duplicate entry and return -EEXIST if
such entry exists.  This simplifies node addition code a bit.

This patch doesn't introduce any noticeable behavior change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:04 -07:00
Tejun Heo
41fc1c2745 sysfs: make sysfs_add/remove_one() call link/unlink_sibling() implictly
When adding or removing a sysfs_dirent, the user used to be required
to call link/unlink separately.  It was for two reasons - code looked
like that before sysfs_addrm_cxt conversion and to avoid looping
through parent_sd->children list twice during removal.

Performance optimization during removal just isn't worth it.  Make
sysfs_add/remove_one() call sysfs_link/unlink_sibing() implicitly.
This makes code simpler albeit slightly less efficient.  This change
doesn't introduce any noticeable behavior change.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:03 -07:00
Tejun Heo
ff869de7bf sysfs: simplify sysfs_rename_dir()
With the shadow directories gone, sysfs_rename_dir() can be simplified.

* parent doesn't need to be grabbed separately.  Just access
  old_dentry->d_parent.

* parent sd can never change.  Remove code to move under the new
  parent.

* Massage comments a bit.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:03 -07:00
Tejun Heo
a7a0475497 sysfs: cosmetic changes in sysfs_lookup()
* remove space between * and symbol name in variable declaration.

* kill unnecessary new line.

* kill 'found' and test 'sd' instead.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:03 -07:00
Eric W. Biederman
90bc61359d sysfs: Remove first pass at shadow directory support
While shadow directories appear to be a good idea, the current scheme
of controlling their creation and destruction outside of sysfs appears
to be a locking and maintenance nightmare in the face of sysfs
directories dynamically coming and going.  Which can now occur for
directories containing network devices when CONFIG_SYSFS_DEPRECATED is
not set.

This patch removes everything from the initial shadow directory support
that allowed the shadow directory creation to be controlled at a higher
level.  So except for a few bits of sysfs_rename_dir everything from
commit b592fcfe7f is now gone.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:03 -07:00
Dave Young
869512ab5a sysfs: cleanup semaphore.h
Cleanup semaphore.h

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:03 -07:00
Dave Young
52e8c209d6 sysfs/file.c - use mutex instead of semaphore
Use mutex instead of semaphore in sysfs/file.c : sys_buffer.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:03 -07:00
Robin Getz
2ebefc5016 debugfs: helper for decimal challenged
Allows debugfs helper functions to have a hex output, rather than just decimal

Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:03 -07:00
Greg Kroah-Hartman
34980ca8fa Drivers: clean up direct setting of the name of a kset
A kset should not have its name set directly, so dynamically set the
name at runtime.

This is needed to remove the static array in the kobject structure which
will be changed in a future patch.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:02 -07:00
Greg Kroah-Hartman
19c38de88a kobjects: fix up improper use of the kobject name field
A number of different drivers incorrect access the kobject name field
directly.  This is not correct as the name might not be in the array.
Use the proper accessor function instead.
2007-10-12 14:51:02 -07:00
Mark Fasheh
e7b3401960 ocfs2: Optionally return filldir errors
Modify ocfs2_dir_foreach_blk() to optionally return any error from the
filldir callback. This way ocfs2_dirforeach() can terminate early, as
opposed to always passing through the entire directory. This fixes a bug
introduced during a previous code refactor where ocfs2_empty_dir() would
loop infinitely.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:41 -07:00
Mark Fasheh
5b6a3a2b4a ocfs2: Write support for directories with inline data
Create all new directories with OCFS2_INLINE_DATA_FL and the inline data
bytes formatted as an empty directory. Inode size field reflects the actual
amount of inline data available, which makes searching for dirent space
very similar to the regular directory search.

Inline-data directories are automatically pushed out to extents on any
insert request which is too large for the available space.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:41 -07:00
Mark Fasheh
23193e513d ocfs2: Read support for directories with inline data
This splits out extent based directory read support and implements
inline-data versions of those functions. All knowledge of inline-data versus
extent based directories is internalized. For lookups the code uses
ocfs2_find_entry_id(), full dir iterations make use of
ocfs2_dir_foreach_blk_id().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:40 -07:00
Mark Fasheh
1afc32b952 ocfs2: Write support for inline data
This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
inode data.

For the most part, the changes to the core write code can be relied on to do
the heavy lifting. Any code calling ocfs2_write_begin (including shared
writeable mmap) can count on it doing the right thing with respect to
growing inline data to an extent tree.

Size reducing truncates, including UNRESVP can simply zero that portion of
the inode block being removed. Size increasing truncatesm, including RESVP
have to be a little bit smarter and grow the inode to an extent tree if
necessary.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:40 -07:00
Mark Fasheh
6798d35a31 ocfs2: Read support for inline data
This hooks up ocfs2_readpage() to populate a page with data from an inode
block. Direct IO reads from inline data are modified to fall back to
buffered I/O. Appropriate checks are also placed in the extent map code to
avoid reading an extent list when inline data might be stored.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:39 -07:00
Mark Fasheh
15b1e36bdb ocfs2: Structure updates for inline data
Add the disk, network and memory structures needed to support data in inode.

Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing
inline data.

A new inode field, i_dyn_features, is added to facilitate tracking of
dynamic inode state. Since it will be used often, we want to mirror it on
ocfs2_inode_info, and transfer it via the meta data lvb.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:39 -07:00
Mark Fasheh
8553cf4f36 ocfs2: Cleanup dirent size check
The check to see if a new dirent would fit in an old one is pretty ugly, and
it's done at least twice. Clean things up by putting this in it's own
easier-to-read function.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:39 -07:00
Mark Fasheh
38760e2432 ocfs2: Rename cleanups
ocfs2_rename() does direct manipulation of the dirent it's gotten back from
a directory search. Wrap this manipulation inside of a function so that we
can transparently change directory update behavior in the future. As an
added bonus, this gets rid of an ugly macro.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:38 -07:00
Mark Fasheh
be94d11704 ocfs2: Provide convenience function for ino lookup
A couple paths which needed to just match a parent dir + name pair to an
inode number were a bit messy because they had to deal with
ocfs2_find_files_on_disk() which returns a larger number of values. Provide
a convenience function, ocfs2_lookup_ino_from_name() which internalizes all
the extra accounting.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:38 -07:00
Mark Fasheh
0bfbbf62a8 ocfs2: Implement ocfs2_empty_dir() as a caller of ocfs2_dir_foreach()
We can preserve the behavior of ocfs2_empty_dir(), while getting rid of the
open coded directory walk by just providing a smart filldir callback. This
also automatically gets to use the dir readahead code, though in this case
any advantage is minor at best.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:37 -07:00
Mark Fasheh
5eae5b96fc ocfs2: Remove open coded readdir()
ocfs2_queue_orphans() has an open coded readdir loop which can easily just
use a directory accessor function.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:37 -07:00
Mark Fasheh
7e8536797d ocfs2: Pass raw u64 to filldir
filldir_t can take this, so don't turn de->inode into a 32 bit value. Right
now this doesn't make a difference since no ocfs2 inodes overflow that, but
it could be a nasty surprise later on if some kernel code is calling
ocfs2_dir_foreach_blk() and expecting real inode numbers back...

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:37 -07:00
Mark Fasheh
b8bc5f4fde ocfs2: Abstract out core dir listing functionality
Put this in it's own function so that the functionality can be overridden.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:36 -07:00
Mark Fasheh
316f4b9f98 ocfs2: Move directory manipulation code into dir.c
The code for adding, removing, deleting directory entries was splattered all
over namei.c. I'd rather have this all centralized, so that it's easier to
make changes for inline dir data, and eventually indexed directories.

None of the code in any of the functions was changed. I only removed the
static keyword from some prototypes so that they could be exported.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:36 -07:00
Mark Fasheh
1d410a6e33 ocfs2: Small refactor of truncate zeroing code
We'll want to reuse most of this when pushing inline data back out to an
extent. Keeping this part as a seperate patch helps to keep the upcoming
changes for write support uncluttered.

The core portion of ocfs2_zero_cluster_pages() responsible for making sure a
page is mapped and properly dirtied is abstracted out into it's own
function, ocfs2_map_and_dirty_page(). Actual functionality doesn't change,
though zeroing becomes optional.

We also turn part of ocfs2_free_write_ctxt() into  a common function for
unlocking and freeing a page array. This operation is very common (and
uniform) for Ocfs2 cluster sizes greater than page size, so it makes sense
to keep the code in one place.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:35 -07:00
Mark Fasheh
65ed39d6ca ocfs2: move nonsparse hole-filling into ocfs2_write_begin()
By doing this, we can remove any higher level logic which has to have
knowledge of btree functionality - any callers of ocfs2_write_begin() can
now expect it to do anything necessary to prepare the inode for new data.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12 11:54:35 -07:00
Mark Fasheh
92e91ce2a3 ocfs2: Sync ocfs2_fs.h with ocfs2-tools
ocfs2-tools added some on-disk fields and flags which are used by
tunefs.ocfs2.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:34 -07:00
Denis Cheng
bddb8eb37f [PATCH] fs/ocfs2/: removed unneeded initial value and function's return value
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:34 -07:00
Sunil Mushran
d550071c03 ocfs2: Implement show_options()
Implement sops->show_options() so as to allow /proc/mounts to show the mount
options.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:33 -07:00
Mark Fasheh
19b613d410 ocfs2: Clear slot map when umounting a local volume
This is technically harmless (recovery will clean it out later), but leaves
a bogus entry in the slot_map which really shouldn't be there.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:33 -07:00
Mark Fasheh
015452b15f ocfs2: Remove unused structure field
c_used_tail_recs in struct ocfs2_merge_ctxt is only ever set, so we can
remove it.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:32 -07:00
Tao Mao
518d7269f3 ocfs2: remove unused variable
delete_tail_recs in ocfs2_try_to_merge_extent() was only ever set, remove
it.

Signed-off-by: Tao Mao <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:32 -07:00
Tao Mao
c77534f6fb ocfs2: remove mostly unused field from insert structure
ocfs2_insert_type->ins_free_records was only used in one place, and was set
incorrectly in most places. We can free up some memory and lose some code by
removing this.

* Small warning fixup contributed by Andrew Mortom <akpm@linux-foundation.org>

Signed-off-by: Tao Mao <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12 11:54:32 -07:00
Anton Altaparmakov
bfab36e816 NTFS: Fix a mount time deadlock.
Big thanks go to Mathias Kolehmainen for reporting the bug, providing
debug output and testing the patches I sent him to get it working.

The fix was to stop calling ntfs_attr_set() at mount time as that causes
balance_dirty_pages_ratelimited() to be called which on systems with
little memory actually tries to go and balance the dirty pages which tries
to take the s_umount semaphore but because we are still in fill_super()
across which the VFS holds s_umount for writing this results in a
deadlock.

We now do the dirty work by hand by submitting individual buffers.  This
has the annoying "feature" that mounting can take a few seconds if the
journal is large as we have clear it all.  One day someone should improve
on this by deferring the journal clearing to a helper kernel thread so it
can be done in the background but I don't have time for this at the moment
and the current solution works fine so I am leaving it like this for now.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-12 09:16:30 -07:00
Linus Torvalds
f26e51f67a Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (51 commits)
  [DLM] block dlm_recv in recovery transition
  [DLM] don't overwrite castparam if it's NULL
  [GFS2] Get superblock a different way
  [GFS2] Don't try to remove buffers that don't exist
  [GFS2] Alternate gfs2_iget to avoid looking up inodes being freed
  [GFS2] Data corruption fix
  [GFS2] Clean up journaled data writing
  [GFS2] GFS2: chmod hung - fix race in thread creation
  [DLM] Make dlm_sendd cond_resched more
  [GFS2] Move inode deletion out of blocking_cb
  [GFS2] flocks from same process trip kernel BUG at fs/gfs2/glock.c:1118!
  [GFS2] Clean up gfs2_trans_add_revoke()
  [GFS2] Use slab operations for all gfs2_bufdata allocations
  [GFS2] Replace revoke structure with bufdata structure
  [GFS2] Fix ordering of dirty/journal for ordered buffer unstuffing
  [GFS2] Clean up ordered write code
  [GFS2] Move pin/unpin into lops.c, clean up locking
  [GFS2] Don't mark jdata dirty in gfs2_unstuffer_page()
  [GFS2] Introduce gfs2_remove_from_ail
  [GFS2] Correct lock ordering in unlink
  ...
2007-10-12 09:14:51 -07:00
Al Viro
782e3b3b38 Fix up more bio fallout
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-12 00:29:50 -07:00
Linus Torvalds
e86908614f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (408 commits)
  [POWERPC] Add memchr() to the bootwrapper
  [POWERPC] Implement logging of unhandled signals
  [POWERPC] Add legacy serial support for OPB with flattened device tree
  [POWERPC] Use 1TB segments
  [POWERPC] XilinxFB: Allow fixed framebuffer base address
  [POWERPC] XilinxFB: Add support for custom screen resolution
  [POWERPC] XilinxFB: Use pdata to pass around framebuffer parameters
  [POWERPC] PCI: Add 64-bit physical address support to setup_indirect_pci
  [POWERPC] 4xx: Kilauea defconfig file
  [POWERPC] 4xx: Kilauea DTS
  [POWERPC] 4xx: Add AMCC Kilauea eval board support to platforms/40x
  [POWERPC] 4xx: Add AMCC 405EX support to cputable.c
  [POWERPC] Adjust TASK_SIZE on ppc32 systems to 3GB that are capable
  [POWERPC] Use PAGE_OFFSET to tell if an address is user/kernel in SW TLB handlers
  [POWERPC] 85xx: Enable FP emulation in MPC8560 ADS defconfig
  [POWERPC] 85xx: Killed <asm/mpc85xx.h>
  [POWERPC] 85xx: Add cpm nodes for 8541/8555 CDS
  [POWERPC] 85xx: Convert mpc8560ads to the new CPM binding.
  [POWERPC] mpc8272ads: Remove muram from the CPM reg property.
  [POWERPC] Make clockevents work on PPC601 processors
  ...

Fixed up conflict in Documentation/powerpc/booting-without-of.txt manually.
2007-10-11 21:55:47 -07:00
Jeff Garzik
e30408b2a9 JFS: fix bio-related build breakage
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-11 21:15:14 -07:00
Linus Torvalds
038a5008b2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (867 commits)
  [SKY2]: status polling loop (post merge)
  [NET]: Fix NAPI completion handling in some drivers.
  [TCP]: Limit processing lost_retrans loop to work-to-do cases
  [TCP]: Fix lost_retrans loop vs fastpath problems
  [TCP]: No need to re-count fackets_out/sacked_out at RTO
  [TCP]: Extract tcp_match_queue_to_sack from sacktag code
  [TCP]: Kill almost unused variable pcount from sacktag
  [TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L
  [TCP]: Add bytes_acked (ABC) clearing to FRTO too
  [IPv6]: Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2
  [NETFILTER]: x_tables: add missing ip6t_modulename aliases
  [NETFILTER]: nf_conntrack_tcp: fix connection reopening
  [QETH]: fix qeth_main.c
  [NETLINK]: fib_frontend build fixes
  [IPv6]: Export userland ND options through netlink (RDNSS support)
  [9P]: build fix with !CONFIG_SYSCTL
  [NET]: Fix dev_put() and dev_hold() comments
  [NET]: make netlink user -> kernel interface synchronious
  [NET]: unify netlink kernel socket recognition
  [NET]: cleanup 3rd argument in netlink_sendskb
  ...

Fix up conflicts manually in Documentation/feature-removal-schedule.txt
and my new least favourite crap, the "mod_devicetable" support in the
files include/linux/mod_devicetable.h and scripts/mod/file2alias.c.

(The latter files seem to be explicitly _designed_ to get conflicts when
different subsystems work with them - that have an absolutely horrid
lack of subsystem separation!)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-11 19:40:14 -07:00
Peter Zijlstra
34a3d1e837 lockdep: annotate journal_start()
On Fri, 2007-07-13 at 02:05 -0700, Andrew Morton wrote:

> Except lockdep doesn't know about journal_start(), which has ranking
> requirements similar to a semaphore.  

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-11 22:11:12 +02:00
Trond Myklebust
05c88babab NFSv4: Fix a typo in nfs_inode_reclaim_delegation
We were intending to put the previous instance of delegation->cred
before setting a new one.

Thanks to David Howells for spotting this.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-11 15:11:51 -04:00
Denis V. Lunev
cd40b7d398 [NET]: make netlink user -> kernel interface synchronious
This patch make processing netlink user -> kernel messages synchronious.
This change was inspired by the talk with Alexey Kuznetsov about current
netlink messages processing. He says that he was badly wrong when introduced 
asynchronious user -> kernel communication.

The call netlink_unicast is the only path to send message to the kernel
netlink socket. But, unfortunately, it is also used to send data to the
user.

Before this change the user message has been attached to the socket queue
and sk->sk_data_ready was called. The process has been blocked until all
pending messages were processed. The bad thing is that this processing
may occur in the arbitrary process context.

This patch changes nlk->data_ready callback to get 1 skb and force packet
processing right in the netlink_unicast.

Kernel -> user path in netlink_unicast remains untouched.

EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
drop, but the process remains in the cycle until the message will be fully
processed. So, there is no need to use this kludges now.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 21:15:29 -07:00
Pavel Emelyanov
39699037a5 [FS] seq_file: Introduce the seq_open_private()
This function allocates the zeroed chunk of memory and
call seq_open(). The __seq_open_private() helper returns
the allocated memory to make it possible for the caller
to initialize it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:55:33 -07:00
Pavel Emelyanov
4665079cbb [NETNS]: Move some code into __init section when CONFIG_NET_NS=n
With the net namespaces many code leaved the __init section,
thus making the kernel occupy more memory than it did before.
Since we have a config option that prohibits the namespace
creation, the functions that initialize/finalize some netns
stuff are simply not needed and can be freed after the boot.

Currently, this is almost not noticeable, since few calls
are no longer in __init, but when the namespaces will be
merged it will be possible to free more code. I propose to
use the __net_init, __net_exit and __net_initdata "attributes"
for functions/variables that are not used if the CONFIG_NET_NS
is not set to save more space in memory.

The exiting functions cannot just reside in the __exit section,
as noticed by David, since the init section will have
references on it and the compilation will fail due to modpost
checks. These references can exist, since the init namespace
never dies and the exit callbacks are never called. So I
introduce the __exit_refok attribute just like it is already
done with the __init_refok.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:58 -07:00
Eric W. Biederman
077130c0cf [NET]: Fix race when opening a proc file while a network namespace is exiting.
The problem:  proc_net files remember which network namespace the are
against but do not remember hold a reference count (as that would pin
the network namespace).   So we currently have a small window where
the reference count on a network namespace may be incremented when opening
a /proc file when it has already gone to zero.

To fix this introduce maybe_get_net and get_proc_net.

maybe_get_net increments the network namespace reference count only if it is
greater then zero, ensuring we don't increment a reference count after it
has gone to zero.

get_proc_net handles all of the magic to go from a proc inode to the network
namespace instance and call maybe_get_net on it.

PROC_NET the old accessor is removed so that we don't get confused and use
the wrong helper function.

Then I fix up the callers to use get_proc_net and handle the case case
where get_proc_net returns NULL.  In that case I return -ENXIO because
effectively the network namespace has already gone away so the files
we are trying to access don't exist anymore.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:22 -07:00
Daniel Lezcano
36ac3135f5 [NETNS]: Fix export symbols.
Add the appropriate EXPORT_SYMBOLS for proc_net_create,
proc_net_fops_create and proc_net_remove to fix errors when
compiling allmodconfig

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:16 -07:00
David S. Miller
3c12afe75f [NET]: Fix missed addition of fs/proc/proc_net.c
My bad.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:14 -07:00
Eric W. Biederman
881d966b48 [NET]: Make the device list and device lookups per namespace.
This patch makes most of the generic device layer network
namespace safe.  This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables.  The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces.  The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace.  This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change.  Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:10 -07:00
Eric W. Biederman
b4b510290b [NET]: Support multiple network namespaces with netlink
Each netlink socket will live in exactly one network namespace,
this includes the controlling kernel sockets.

This patch updates all of the existing netlink protocols
to only support the initial network namespace.  Request
by clients in other namespaces will get -ECONREFUSED.
As they would if the kernel did not have the support for
that netlink protocol compiled in.

As each netlink protocol is updated to be multiple network
namespace safe it can register multiple kernel sockets
to acquire a presence in the rest of the network namespaces.

The implementation in af_netlink is a simple filter implementation
at hash table insertion and hash table look up time.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:09 -07:00
Eric W. Biederman
457c4cbc5a [NET]: Make /proc/net per network namespace
This patch makes /proc/net per network namespace.  It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:06 -07:00
Eric W. Biederman
32da477a5b [NET]: Don't implement dev_ifname32 inline
The current implementation of dev_ifname makes maintenance difficult
because updates to the implementation of the ioctl have to made in two
places.  So this patch updates dev_ifname32 to do a classic 32/64
structure conversion and call sys_ioctl like the rest of the
compat calls do.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:03 -07:00
David Teigland
c36258b592 [DLM] block dlm_recv in recovery transition
Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
threads while working in the dlm.  This allows dlm_recv activity to be
suspended when the lockspace transitions to, from and between recovery
cycles.

The specific bug prompting this change is one where an in-progress
recovery cycle is aborted by a new recovery cycle.  While dlm_recv was
processing a recovery message, the recovery cycle was aborted and
dlm_recoverd began cleaning up.  dlm_recv decremented recover_locks_count
on an rsb after dlm_recoverd had reset it to zero.  This is fixed by
suspending dlm_recv (taking write lock on the rwsem) before aborting the
current recovery.

The transitions to/from normal and recovery modes are simplified by using
this new ability to block dlm_recv.  The switch from normal to recovery
mode means dlm_recv goes from processing locking messages, to saving them
for later, and vice versa.  Races are avoided by blocking dlm_recv when
setting the flag that switches between modes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:56:38 +01:00
Patrick Caulfield
b434eda6fd [DLM] don't overwrite castparam if it's NULL
If the castaddr passed to the userland API is NULL then don't overwrite the
existing castparam. This allows a different thread to cancel a lock request and
the CANCEL AST gets delivered to the original thread.

bz#306391 (for RHEL4) refers.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10 08:56:36 +01:00