linux

Author	SHA1	Message	Date
Trond Myklebust	88d9093997	NFSv4: nfs_increment_open_seqid should not return a value It is a void function... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:39 -04:00
Trond Myklebust	e6889620e8	NFSv4: Fix underestimate of NFSv4 lookup request size Also fix up the underestimate of fs_locations Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:39 -04:00
Trond Myklebust	2cebf82883	NFSv4: Fix the underestimate of NFSv4 open request size The maximum size depends on the filename size and a number of other elements which are currently not being counted. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:39 -04:00
Trond Myklebust	bd625ba80d	NFSv4: Fix the NFSv4 owner and owner_group size estimates Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:39 -04:00
Trond Myklebust	7af654f8d1	NFSv4: Don't reuse expired nfs4_state_owner structs That just confuses certain NFSv4 servers. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:38 -04:00
Trond Myklebust	27b3f949b7	NFSv4: Fix a credential reference leak in nfs4_get_state_owner() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:38 -04:00
Trond Myklebust	587142f85f	NFS: Replace NFS_I(inode)->req_lock with inode->i_lock There is no justification for keeping a special spinlock for the exclusive use of the NFS writeback code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:38 -04:00
Trond Myklebust	4e56e082dd	NFSv4: Clean up _nfs4_proc_lookup() vs _nfs4_proc_lookupfh() They differ only slightly in the arguments they take. Why have they not been merged? Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:38 -04:00
Trond Myklebust	1be27f3660	SUNRPC: Remove the tk_auth macro... We should almost always be deferencing the rpc_auth struct by means of the credential's cr_auth field instead of the rpc_clnt->cl_auth anyway. Fix up that historical mistake, and remove the macro that propagated it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:37 -04:00
Trond Myklebust	f61534dfd3	SUNRPC: Remove redundant calls to rpciod_up()/rpciod_down() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:30 -04:00
Trond Myklebust	90c5755ff5	SUNRPC: Kill rpc_clnt->cl_oneshot Replace it with explicit calls to rpc_shutdown_client() or rpc_destroy_client() (for the case of asynchronous calls). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:29 -04:00
Trond Myklebust	34f52e3591	SUNRPC: Convert rpc_clnt->cl_users to a kref Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:28 -04:00
Trond Myklebust	c6d00e639b	NFSv4: Convert struct nfs4_opendata to use struct kref Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:28 -04:00
Trond Myklebust	3bec63db55	NFS: Convert struct nfs_open_context to use a kref Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:27 -04:00
Trond Myklebust	edc05fc1c2	NFS: reduce latency by using conditional rescheduling in nfs_scan_list Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:27 -04:00
Trond Myklebust	dce34ce298	NFS: Prevent integer overflow in nfs_scan_list() Also ensure that nfs_inode ncommit and npages are large enough to represent all possible values for the number of pages. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:27 -04:00
Trond Myklebust	2aefa10431	NFS: Remove the redundant 'dirty' and 'commit' lists from nfs_inode Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:26 -04:00
Trond Myklebust	5c36968343	NFS cleanup: speed up nfs_scan_commit using radix tree tags Add a tag for requests that are waiting for a COMMIT Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:26 -04:00
Trond Myklebust	9fd367f0f3	NFS cleanup: Rename NFS_PAGE_TAG_WRITEBACK to NFS_PAGE_TAG_LOCKED Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:26 -04:00
Trond Myklebust	c03b402461	NFS: Convert struct nfs_page to use krefs Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:26 -04:00
Trond Myklebust	a50f7951a3	NFS: Fix an Oops in the nfs_access_cache_shrinker() The nfs_access_cache_shrinker may race with nfs_access_zap_cache(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:25 -04:00
Trond Myklebust	e2f032e9ef	NFS: nfs3_proc_create() should use nfs_post_op_update_inode() Also get rid of a redundant call to nfs_setattr_update_inode(). The call to nfs3_proc_setattr() already takes care of that. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:25 -04:00
Jeff Layton	aa53ed541a	NFS4: on a O_EXCL OPEN make sure SETATTR sets the fields holding the verifier The Linux NFS4 client simply skips over the bitmask in an O_EXCL open call and so it doesn't bother to reset any fields that may be holding the verifier. This patch has us save the first two words of the bitmask (which is all the current client has #defines for). The client then later checks this bitmask and turns on the appropriate flags in the sattr->ia_verify field for the following SETATTR call. This patch only currently checks to see if the server used the atime and mtime slots for the verifier (which is what the Linux server uses for this). I'm not sure of what other fields the server could reasonably use, but adding checks for others should be trivial. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:25 -04:00
Trond Myklebust	fc6ae3cf48	NFS: Re-enable forced umounts They disappeared some time around 2.6.18. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:25 -04:00
Jeff Layton	83d93f2229	NFS: Use GFP_HIGHUSER for page allocation in nfs_symlink() nfs_symlink() allocates a GFP_KERNEL page for the pagecache. Most pagecache pages are allocated using GFP_HIGHUSER, and there's no reason not to do that in nfs_symlink() as well. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2007-07-10 23:40:25 -04:00
Trond Myklebust	a0356862bc	NFS: Fix nfs_reval_fsid() We don't need to revalidate the fsid on the root directory. It suffices to revalidate it on the current directory. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:24 -04:00
Trond Myklebust	b39e625b6e	NFSv4: Clean up nfs4_call_async() Use rpc_run_task() instead of doing it ourselves. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:24 -04:00
Trond Myklebust	4a35bd41af	NFSv4: Ensure that nfs4_do_close() doesn't race with umount nfs4_do_close() does not currently have any way to ensure that the user won't attempt to unmount the partition while the asynchronous RPC call is completing. This again may cause Oopses in nfs_update_inode(). Add a vfsmount argument to nfs4_close_state to ensure that the partition remains mounted while we're closing the file. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:24 -04:00
Trond Myklebust	ad389da79f	NFSv4: Ensure asynchronous open() calls always pin the mountpoint A number of race conditions may currently ensue if the user presses ^C and then unmounts the partition while an asynchronous open() is in progress. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:24 -04:00
Trond Myklebust	539cd03a57	NFSv4: Cleanup: pass the nfs_open_context to open recovery code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:24 -04:00
Trond Myklebust	88be9f990f	NFS: Replace vfsmount and dentry in nfs_open_context with struct path Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:23 -04:00
Trond Myklebust	de05a0cc2a	NFS: Minor read optimisation... Since PG_uptodate may now end up getting set during the call to nfs_wb_page(), we can avoid putting a read request on the wire in those situations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:23 -04:00
Trond Myklebust	44dd151d5c	NFS: Don't mark a written page as uptodate until it is on disk The write may fail, so we should not mark the page as uptodate until we are certain that the data has been accepted and written to disk by the server. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:23 -04:00
Trond Myklebust	d9df8d6b38	NFS: Don't fail an O_DIRECT read/write if get_user_pages() returns pages There is no need to fail the entire O_DIRECT read/write just because get_user_pages() returned fewer pages than we requested. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:23 -04:00
Chuck Lever	070ea60214	NFS: Clean ups in fs/nfs/direct.c Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-07-10 23:40:23 -04:00
Pavel Emelianov	bcf67e1625	Make common helpers for seq_files that work with list_heads Many places in kernel use seq_file API to iterate over a regular list_head. The code for such iteration is identical in all the places, so it's worth introducing a common helpers. This makes code about 300 lines smaller: The first version of this patch made the helper functions static inline in the seq_file.h header. This patch moves them to the fs/seq_file.c as Andrew proposed. The vmlinux .text section sizes are as follows: 2.6.22-rc1-mm1: 0x001794d5 with the previous version: 0x00179505 with this patch: 0x00179135 The config file used was make allnoconfig with the "y" inclusion of all the possible options to make the files modified by the patch compile plus drivers I have on the test node. This patch: Many places in kernel use seq_file API to iterate over a regular list_head. The code for such iteration is identical in all the places, so it's worth introducing a common helpers. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-10 17:51:13 -07:00
Eric Sandeen	54c57dc3b6	[PATCH] ocfs2: zero_user_page conversion Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:10 -07:00
Mark Fasheh	b25801038d	ocfs2: Support xfs style space reservation ioctls We re-use the RESVSP/UNRESVSP ioctls from xfs which allow the user to allocate and deallocate regions to a file without zeroing data or changing i_size. Though renamed, the structure passed in from user is identical to struct xfs_flock64. The three fields that are actually used right now are l_whence, l_start and l_len. This should get ocfs2 immediate compatibility with userspace software using the pre-existing xfs ioctls. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:09 -07:00
Mark Fasheh	063c4561f5	ocfs2: support for removing file regions Provide an internal interface for the removal of arbitrary file regions. ocfs2_remove_inode_range() takes a byte range within a file and will remove existing extents within that range. Partial clusters will be zeroed so that any read from within the region will return zeros. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:08 -07:00
Mark Fasheh	35edec1d52	ocfs2: update truncate handling of partial clusters The partial cluster zeroing code used during truncate usually assumes that the rightmost byte in the range to be zeroed lies on a cluster boundary. This makes sense for truncate, but punching holes might require zeroing on non-aligned rightmost boundaries. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:07 -07:00
Mark Fasheh	d0c7d7082e	ocfs2: btree support for removal of arbirtrary extents Add code to the btree paths to support the removal of arbitrary regions within an existing extent. With proper higher level support this can be used to "punch holes" in a file. Truncate (a special case of hole punching) could also be converted to use these methods. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:05 -07:00
Mark Fasheh	2ae99a6037	ocfs2: Support creation of unwritten extents This can now be trivially supported with re-use of our existing extend code. ocfs2_allocate_unwritten_extents() takes a start offset and a byte length and iterates over the inode, adding extents (marked as unwritten) until len is reached. Existing extents are skipped over. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:04 -07:00
Mark Fasheh	b27b7cbcf1	ocfs2: support writing of unwritten extents Update the write code to detect when the user is asking to write to an unwritten extent. Like writing to a hole, we must zero the region between the write and the cluster boundaries. Most of the existing cluster zeroing logic can be re-used with some additional checks for the unwritten flag on extent records. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:03 -07:00
Mark Fasheh	0d172baa55	ocfs2: small cleanup of ocfs2_write_begin_nolock() We can easily seperate out the write descriptor setup and manipulation into helper functions. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:01 -07:00
Mark Fasheh	328d5752e1	ocfs2: btree changes for unwritten extents Writes to a region marked as unwritten might result in a record split or merge. We can support splits by making minor changes to the existing insert code. Merges require left rotations which mostly re-use right rotation support functions. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:32:00 -07:00
Mark Fasheh	c3afcbb344	ocfs2: abstract btree growing calls The top level calls and logic for growing a tree can easily be abstracted out of ocfs2_insert_extent() into a seperate function - ocfs2_grow_tree(). This allows future code to easily grow btrees when needed. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:58 -07:00
Mark Fasheh	1f6697d072	ocfs2: use all extent block suballocators Now that we have a method to deallocate blocks from them, each node should allocate extent blocks from their local suballocator file. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:56 -07:00
Mark Fasheh	59a5e416d1	ocfs2: plug truncate into cached dealloc routines Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:55 -07:00
Mark Fasheh	2b604351bc	ocfs2: simplify deallocation locking Deallocation of suballocator blocks, most notably extent blocks, might involve multiple suballocator inodes. The locking for this can get extremely complicated, especially when the suballocator inodes to delete from aren't known until deep within an unrelated codepath. Implement a simple scheme for recording the blocks to be unlinked so that the actual deallocation can be done in a context which won't deadlock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:54 -07:00
Mark Fasheh	bce997682f	ocfs2: harden buffer check during mapping of page blocks We don't want to submit buffer_new blocks for read i/o. This actually won't happen right now because those requests during an allocating write are all nicely aligned. It's probably a good idea to provide an explicit check though. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:52 -07:00
Mark Fasheh	7307de8051	ocfs2: shared writeable mmap Implement cluster consistent shared writeable mappings using the ->page_mkwrite() callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:51 -07:00
Mark Fasheh	607d44aa3f	ocfs2: factor out write aops into nolock variants ocfs2_mkwrite() will want this so that it can add some mmap specific checks before asking for a write. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:49 -07:00
Mark Fasheh	3a307ffc27	ocfs2: rework ocfs2_buffered_write_cluster() Use some ideas from the new-aops patch series and turn ocfs2_buffered_write_cluster() into a 2 stage operation with the caller copying data in between. The code now understands multiple cluster writes as a result of having to deal with a full page write for greater than 4k pages. This sets us up to easily call into the write path during ->page_mkwrite(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:31:46 -07:00
Mark Fasheh	2e89b2e48e	ocfs2: take ip_alloc_sem during entire truncate Use of the alloc sem during truncate was too narrow - we want to protect the i_size change and page truncation against mmap now. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:57 -07:00
Sunil Mushran	baf4661a82	ocfs2: Add "preferred slot" mount option ocfs2 will attempt to assign the node the slot# provided in the mount option. Failure to assign the preferred slot is not an error. This small feature can be useful for automated testing. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:54 -07:00
Shani Moideen	5fb0f7f010	[KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c Signed-off-by: Shani Moideen <shani.moideen@wipro.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:52 -07:00
Christoph Hellwig	800deef3f6	[PATCH] ocfs2: use list_for_each_entry where benefical Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:49 -07:00
Joel Becker	e6df3a663a	ocfs2: Wake up a starting region if it gets killed in the background. Tell o2cb_region_dev_write() to wake up if rmdir(2) happens on the heartbeat region while it is starting up. Then o2hb_region_dev_write() can check to see if it is alive and act accordingly. This prevents a hang (not being woken) and a crash (if it's woken by a signal). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:46 -07:00
Joel Becker	16c6a4f24d	ocfs2: live heartbeat depends on the local node configuration Removing the local node configuration out from underneath a running heartbeat is "bad". Provide an API in the ocfs2 nodemanager to request a configfs dependancy on the local node, then use it in heartbeat. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:43 -07:00
Joel Becker	14829422be	ocfs2: Depend on configfs heartbeat items. ocfs2 mounts require a heartbeat region. Use the new configfs_depend_item() facility to actually depend on them so they can't go away from under us. First, teach cluster/nodemanager.c to depend an item on the o2cb subsystem. Then teach o2hb_register_callbacks to take a UUID and depend on the appropriate region. Finally, teach all users of o2hb to pass a UUID or NULL if they don't require a pin. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:19:40 -07:00
Joel Becker	631d1febab	configfs: config item dependancies. Sometimes other drivers depend on particular configfs items. For example, ocfs2 mounts depend on a heartbeat region item. If that region item is removed with rmdir(2), the ocfs2 mount must BUG or go readonly. Not happy. This provides two additional API calls: configfs_depend_item() and configfs_undepend_item(). A client driver can call configfs_depend_item() on an existing item to tell configfs that it is depended on. configfs will then return -EBUSY from rmdir(2) for that item. When the item is no longer depended on, the client driver calls configfs_undepend_item() on it. These API cannot be called underneath any configfs callbacks, as they will conflict. They can block and allocate. A client driver probably shouldn't calling them of its own gumption. Rather it should be providing an API that external subsystems call. How does this work? Imagine the ocfs2 mount process. When it mounts, it asks for a heart region item. This is done via a call into the heartbeat code. Inside the heartbeat code, the region item is looked up. Here, the heartbeat code calls configfs_depend_item(). If it succeeds, then heartbeat knows the region is safe to give to ocfs2. If it fails, it was being torn down anyway, and heartbeat can gracefully pass up an error. [ Fixed some bad whitespace in configfs.txt. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:18:59 -07:00
Joel Becker	299894cc90	configfs: accessing item hierarchy during rmdir(2) Add a notification callback, ops->disconnect_notify(). It has the same prototype as ->drop_item(), but it will be called just before the item linkage is broken. This way, configfs users who want to do work while the object is still in the heirarchy have a chance. Client drivers will still need to config_item_put() in their ->drop_item(), if they implement it. They need do nothing in ->disconnect_notify(). They don't have to provide it if they don't care. But someone who wants to be notified before ci_parent is set to NULL can now be notified. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:11:01 -07:00
Johannes Berg	6d748924b7	[PATCH] configsfs buffer: use mutex Seems copied from sysfs, but I don't see a reason here nor there to use a semaphore instead of a mutex. Convert. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:10:58 -07:00
Joel Becker	e6bd07aee7	configfs: Convert subsystem semaphore to mutex Convert the su_sem member of struct configfs_subsystem to a struct mutex, as that's what it is. Also convert all the users and update Documentation/configfs.txt and Documentation/configfs_example.c accordingly. [ Conflict in fs/dlm/config.c with commit `3168b0780d` manually resolved. --Mark ] Inspired-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:10:56 -07:00
Satyam Sharma	3fe6c5ce11	[PATCH] configfs+dlm: Rename config_group_find_obj and state semantics clearly Configfs being based upon sysfs code, config_group_find_obj() is probably so named because of the similar kset_find_obj() in sysfs. However, "kobject"s in sysfs become "config_item"s in configfs, so let's call it config_group_find_item() instead, for sake of uniformity, and make corresponding change in the users of this function. BTW a crucial difference between kset_find_obj and config_group_find_item is in locking expectations. kset_find_obj does its locking by itself, but config_group_find_item expects the caller to do the locking. The reason for this: kset's have their own locks, config_group's don't but instead rely on the subsystem mutex. And, subsystem needn't necessarily be around when config_group_find_item() is called. So let's state these locking semantics explicitly, and rectify the comment, otherwise bugs could continue to occur in future, as they did in the past (refer commit d82b8191e238 in gfs2-2.6-fixes.git). [ I also took the opportunity to fix some bad whitespace and double-empty lines. --Joel ] [ Conflict in fs/dlm/config.c with commit `3168b0780d` manually resolved. --Mark ] Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 17:02:31 -07:00
Satyam Sharma	9b1d9aa4e9	[PATCH] configfs+dlm: Separate out __CONFIGFS_ATTR into configfs.h fs/dlm/config.c contains a useful generic macro called __CONFIGFS_ATTR that is similar to sysfs' __ATTR macro that makes defining attributes easy for any user of configfs. Separate it out into configfs.h so that other users (forthcoming in dynamic netconsole patchset) can use it too. Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 16:52:27 -07:00
Satyam Sharma	4c62b53454	configfs: misc cleanups 1. item.c:config_item_cleanup() is a private function (only called by config_item_release() in same file). However, it is spuriously exported in include/linux/configfs.h, so remove that export and make it static in item.c. Also, it is no longer exported / interface function, so no need to give comment for this function (the comment was stating obvious thing, anyway). 2. Kernel-doc comment format does not allow empty line between end of comment and start of function (declaration line). There were several such spurious empty lines in item.c, so fix them. fs/configfs/item.c \| 15 +++------------ include/linux/configfs.h \| 1 - 2 files changed, 3 insertions(+), 13 deletions(-) Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 16:52:25 -07:00
Joel Becker	b23cdde4c6	configfs: consistent attribute size The attribute store/show code currently limits attributes at PAGE_SIZE. This code comes from sysfs, where it still works that way. However, PAGE_SIZE is not constant. A 16k attribute string works on ia64 but not on x86. Really a subsystem shouldn't allow different attribute sizes based on platform. As such, limit all simple attributes to 4k. This works on all platforms, and is consistent with all current code. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-07-10 16:52:22 -07:00
Linus Torvalds	9f9d763216	Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 * 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [S390] vmlogrdr function annotation. [S390] s390: rename CPU_IDLE to S390_CPU_IDLE [S390] cio: Remove prototype for non-existing function cmf_reset(). [S390] zcrypt: fix request timeout handling [S390] system call optimization. [S390] dasd: Avoid compile warnings on !CONFIG_DASD_PROFILE [S390] Remove volatile from atomic_t [S390] Program check in diag 210 under 31 bit [S390] Bogomips calculation for 64 bit. [S390] smp: Merge smp_count_cpus() and smp_get_save_areas(). [S390] zcore: Fix __user annotation. [S390] fixed cdl-format detection. [S390] sclp: Test facility list before executing a service call. [S390] sclp: introduce some new interfaces. [S390] Fixed comment typo. [S390] vmcp cleanup	2007-07-10 14:46:09 -07:00
Linus Torvalds	1b21f458dd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (57 commits) [GFS2] Accept old format NFS filehandles [GFS2] Small fixes to logging code [DLM] dump more lock values [GFS2] Remove i_mode passing from NFS File Handle [GFS2] Obtaining no_formal_ino from directory entry [GFS2] git-gfs2-nmw-build-fix [GFS2] System won't suspend with GFS2 file system mounted [GFS2] remounting w/o acl option leaves acls enabled [GFS2] inode size inconsistency [DLM] Telnet to port 21064 can stop all lockspaces [GFS2] Fix gfs2_block_truncate_page err return [GFS2] Addendum to the journaled file/unmount patch [GFS2] Simplify multiple glock aquisition [GFS2] assertion failure after writing to journaled file, umount [GFS2] Use zero_user_page() in stuffed_readpage() [GFS2] Remove bogus '\0' in rgrp.c [GFS2] Journaled file write/unstuff bug [DLM] don't require FS flag on all nodes [GFS2] Fix deallocation issues [GFS2] return conflicts for GETLK ...	2007-07-10 13:56:13 -07:00
Linus Torvalds	01370f0603	Merge branch 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block * 'splice-2.6.23' of git://git.kernel.dk/data/git/linux-2.6-block: pipe: add documentation and comments pipe: change the ->pin() operation to ->confirm() Remove remnants of sendfile() xip sendfile removal splice: completely document external interface with kerneldoc sendfile: remove bad_sendfile() from bad_file_ops shmem: convert to using splice instead of sendfile() relay: use splice_to_pipe() instead of open-coding the pipe loop pipe: allow passing around of ops private pointer splice: divorce the splice structure/function definitions from the pipe header splice: relay support sendfile: convert nfsd to splice_direct_to_actor() sendfile: convert nfs to using splice_read() loop: convert to using splice_direct_to_actor() instead of sendfile() splice: add void cookie to the actor data sendfile: kill generic_file_sendfile() sendfile: remove .sendfile from filesystems that use generic_file_sendfile() sys_sendfile: switch to using ->splice_read, if available vmsplice: add vmsplice-to-user support splice: abstract out actor data	2007-07-10 13:51:06 -07:00
Steven Whitehouse	3ebf44902f	[GFS2] Accept old format NFS filehandles On Tue, 2007-07-10 at 10:06 +0100, Christoph Hellwig wrote: > > -#define GFS2_LARGE_FH_SIZE 10 > > - > > -struct gfs2_fh_obj { > > - struct gfs2_inum_host this; > > - u32 imode; > > -}; > > +#define GFS2_LARGE_FH_SIZE 8 > > Because gfs2_decode_fh only accepts file handles with GFS2_LARGE_FH_SIZE > or GFS2_LARGE_FH_SIZE you don't accept filehandles sent out by and older > gfs version anymore. Stale filehandles because of a new kernel version > are a big no-no, so please add back code to handle the old filehandles > on the decode side. > This should fix that problem I think since its only relating to end of the fh we can just ignore that field in order to accept the older format. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Wendy Cheng <wcheng@redhat.com>	2007-07-10 12:28:27 +01:00
Stefan Haberland	bf1a95a225	[S390] fixed cdl-format detection. CDL formated DASDs are now detected correctly even if no VOL1 label is on the disk. This prevents possible loss of data. Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-07-10 11:24:44 +02:00
Jens Axboe	0845718daf	pipe: add documentation and comments As per Andrew Mortons request, here's a set of documentation for the generic pipe_buf_operations hooks, the pipe, and pipe_buffer structures. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:16 +02:00
Jens Axboe	cac36bb06e	pipe: change the ->pin() operation to ->confirm() The name 'pin' was badly chosen, it doesn't pin a pipe buffer in the most commonly used sense in the kernel. So change the name to 'confirm', after debating this issue with Hugh Dickins a bit. A good return from ->confirm() means that the buffer is really there, and that the contents are good. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:15 +02:00
Jens Axboe	d96e6e7164	Remove remnants of sendfile() There are now zero users of .sendfile() in the kernel, so kill it from the file_operations structure and in do_sendfile(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:15 +02:00
Carsten Otte	d054fe3d10	xip sendfile removal This patch removes xip_file_sendfile, the sendfile implementation for xip without replacement. Those customers that use xip on s390 are not using sendfile() as far as we know, and so far s390 is the only platform this could potentially be used on so far. Having sendfile is not a popular feature for execute in place file systems, however we have a working implementation of splice_read() based on fs/splice.c if anyone asks for it. At this point in time, it does not seem preferable to merge splice_read() for xip because it causes extra maintenence effort due to code duplication and it requires struct page behind the xip memory segment. We'd like to get rid of that in favor of supporting flash based embedded platforms (Monta Vista work) soon. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:15 +02:00
Jens Axboe	932cc6d4f7	splice: completely document external interface with kerneldoc Also add fs/splice.c as a kerneldoc target with a smaller blurb that should be expanded to better explain the overview of splice. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:15 +02:00
Jens Axboe	d6f517568f	sendfile: remove bad_sendfile() from bad_file_ops do_sendfile() prefers splice over sendfile, so it should not trigger (directly, at least). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:15 +02:00
Jens Axboe	497f9625c2	pipe: allow passing around of ops private pointer relay needs this for proper consumption handling, and the network receive support needs it as well to lookup the sk_buff on pipe release. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:14 +02:00
Jens Axboe	d6b29d7cee	splice: divorce the splice structure/function definitions from the pipe header We need to move even more stuff into the header so that folks can use the splice_to_pipe() implementation instead of open-coding a lot of pipe knowledge (see relay implementation), so move to our own header file finally. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:14 +02:00
Jens Axboe	cf8208d0ea	sendfile: convert nfsd to splice_direct_to_actor() Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:14 +02:00
Jens Axboe	f0930fffa9	sendfile: convert nfs to using splice_read() Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:14 +02:00
Jens Axboe	5ffc4ef45b	sendfile: remove .sendfile from filesystems that use generic_file_sendfile() They can use generic_file_splice_read() instead. Since sys_sendfile() now prefers that, there should be no change in behaviour. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:13 +02:00
Jens Axboe	534f2aaa6a	sys_sendfile: switch to using ->splice_read, if available This patch makes sendfile prefer to use ->splice_read(), if it's available in the file_operations structure. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:12 +02:00
Jens Axboe	6a14b90bb6	vmsplice: add vmsplice-to-user support A bit of a cheat, it actually just copies the data to userspace. But this makes the interface nice and symmetric and enables people to build on splice, with room for future improvement in performance. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:12 +02:00
Jens Axboe	c66ab6fa70	splice: abstract out actor data For direct splicing (or private splicing), the output may not be a file. So abstract out the handling into a specified actor function and put the data in the splice_desc structure earlier, so we can build on top of that. This is the first step in better splice handling for drivers, and also for implementing vmsplice _to_ user memory. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:04:12 +02:00
Adrian Bunk	72d3a38ee0	unexport bio_{,un}map_user bio_{,un}map_user no longer have any modular users. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 08:03:34 +02:00
Steve French	fb8c4b14d9	[CIFS] whitespace cleanup More than halfway there Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-10 01:16:18 +00:00
Linus Torvalds	71d441ddb5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: JFS: Update print_hex_dump() syntax JFS: use print_hex_dump() rather than private dump_mem() function JFS: Whitespace cleanup and remove some dead code	2007-07-09 13:09:16 -07:00
Ingo Molnar	43ae34cb4c	sched: scheduler debugging, core scheduler debugging core: implement /proc/sched_debug and /proc/<PID>/sched files for scheduler debugging. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-09 18:52:00 +02:00
Balbir Singh	172ba844a8	sched: update delay-accounting to use CFS's precise stats update delay-accounting to use CFS's precise stats. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-09 18:52:00 +02:00
Ingo Molnar	b27f03d4bd	sched: make use of precise accounting for /proc task stats make use of CFS's precise accounting to drive /proc/<pid>/stat statistics. this code was co-authored by: Balbir Singh <balbir@linux.vnet.ibm.com> Dmitry Adamushko <dmitry.adamushko@gmail.com> Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>	2007-07-09 18:51:59 +02:00
Ingo Molnar	62480d13d5	sched: remove the SleepAVG field remove the SleepAVG field from /proc/<pid>/status, as with the removal of the sleep-average code this value no longer makes sense. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-07-09 18:51:59 +02:00
Steven Whitehouse	a0a24741ca	[GFS2] Small fixes to logging code This reverts part of an earlier patch which tried to reclaim gfs2_bufdata structures too early and resulted in a "use after free" case (this bit from me). Also a change to not write out log headers unless we really need to (in the case of flushing nothing we don't need a header) from Bob. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2007-07-09 15:43:07 +01:00
Steve French	b609f06ac4	[CIFS] Fix packet signatures for NTLMv2 case Signed-off-by: Yehuda Sadeh Weinraub <Yehuda.Sadeh@expand.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-09 07:55:14 +00:00
David Teigland	ac90a25525	[DLM] dump more lock values Add two more output fields (lkb_flags and rsb nodeid) to the new debugfs file that dumps one lock per line. Also, dump all locks instead of just mastered locks. Accordingly, use a suffix of _locks instead of _master. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:24:13 +01:00
Wendy Cheng	35dcc52e3a	[GFS2] Remove i_mode passing from NFS File Handle GFS2 has been passing i_mode within NFS File Handle. Other than the wrong assumption that there is always room for this extra 16 bit value, the current gfs2_get_dentry doesn't really need the i_mode to work correctly. Note that GFS2 NFS code does go thru the same lookup code path as direct file access route (where the mode is obtained from name lookup) but gfs2_get_dentry() is coded for different purpose. It is not used during lookup time. It is part of the file access procedure call. When the call is invoked, if on-disk inode is not in-memory, it has to be read-in. This makes i_mode passing a useless overhead. Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:24:11 +01:00
Wendy Cheng	bb9bcf0616	[GFS2] Obtaining no_formal_ino from directory entry GFS2 lookup code doesn't ask for inode shared glock. This implies during in-memory inode creation for existing file, GFS2 will not disk-read in the inode contents. This leaves no_formal_ino un-initialized during lookup time. The un-initialized no_formal_ino is subsequently encoded into file handle. Clients will get ESTALE error whenever it tries to access these files. Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:24:08 +01:00
akpm@linux-foundation.org	f4fadb23ca	[GFS2] git-gfs2-nmw-build-fix Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:24:06 +01:00
Abhijith Das	b365762924	[GFS2] System won't suspend with GFS2 file system mounted The kernel threads in gfs2, namely gfs2_scand, gfs2_logd, gfs2_quotad, gfs2_glockd, gfs2_recoverd weren't doing anything when the suspend mechanism was trying to freeze them. I put in calls to refrigerator() in the loops for all the daemons and suspend works as expected. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:24:04 +01:00
Bob Peterson	569a7b6c2e	[GFS2] remounting w/o acl option leaves acls enabled This patch is for bugzilla bug #245663. This crosswrites a fix from gfs1 (bz #210369) so that the mount options are reset properly upon remount. This was tested on system trin-10. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:24:01 +01:00
Wendy Cheng	090ffaa55d	[GFS2] inode size inconsistency This should have been part of the NFS patch #1 but somehow I missed it when packaging the patches. It is not a critical issue as the others (I hope). RHEL 5.1 31.el5 kernel runs fine without this change. Our truncate code is chopped into two parts, one for vfs inode changes (in vmtruncate()) and one of gfs inode (in gfs2_truncatei()). These two operatons are, unfortunately, not atomic. So it could happens that vmtruncate() succeeds (inode->i_size is changed) but gfs2_truncatei fails (say kernel temporarily out of memory). This would leave gfs inode i_di.di_size out of sync with vfs inode i_size. It will later confuse gfs2_commit_write() if a write is issued. Last time I checked, it will cause file corruption. Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:59 +01:00
Patrick Caulfield	97d848365e	[DLM] Telnet to port 21064 can stop all lockspaces This patch fixes Red Hat bz#245892 Opening a tcp connection from a cluster member to another cluster member targeting the dlm port it is enough to stop every dlm operation in the cluster. This means that GFS and rgmanager will hang. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:57 +01:00
S. Wendy Cheng	1875f2f31b	[GFS2] Fix gfs2_block_truncate_page err return Code segment inside gfs2_block_truncate_page() doesn't set the return code correctly. This causes NFSD erroneously returns EIO back to client with setattr procedure call (truncate error). Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:54 +01:00
Robert Peterson	773ed1a044	[GFS2] Addendum to the journaled file/unmount patch This patch is an addendum to the previous journaled file/unmount patch. It fixes a problem discovered during testing. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:52 +01:00
Steven Whitehouse	eaf5bd3cac	[GFS2] Simplify multiple glock aquisition There is a bug in the code which acquires multiple glocks where if the initial out-of-order attempt fails part way though we can land up trying to acquire the wrong number of glocks. This is part of the fix for red hat bz #239737. The other part of the bz doesn't apply to upstream kernels since it was fixed by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d3717bdf8f08a0e1039158c8bab2c24d20f492b6 Since the out-of-order code doesn't appear to add anything to the performance of GFS2, this patch just removed it rather than trying to fix it. It should be much easier to see whats going on here now. In addition, we don't allocate any memory unless we are using a lot of glocks (which is a relatively uncommon case). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:50 +01:00
Robert Peterson	2332c4435b	[GFS2] assertion failure after writing to journaled file, umount This patch passes all my nasty tests that were causing the code to fail under one circumstance or another. Here is a complete summary of all changes from today's git tree, in order of appearance: 1. There are now separate variables for metadata buffer accounting. 2. Variable sd_log_num_hdrs is no longer needed, since the header accounting is taken care of by the reserve/refund sequence. 3. Fixed a tiny grammatical problem in a comment. 4. Added a new function "calc_reserved" to calculate the reserved log space. This isn't entirely necessary, but it has two benefits: First, it simplifies the gfs2_log_refund function greatly. Second, it allows for easier debugging because I could sprinkle the code with calls to this function to make sure the accounting is proper (by adding asserts and printks) at strategic point of the code. 5. In log_pull_tail there apparently was a kludge to fix up the accounting based on a "pull" parameter. The buffer accounting is now done properly, so the kludge was removed. 6. File sync operations were making a call to gfs2_log_flush that writes another journal header. Since that header was unplanned for (reserved) by the reserve/refund sequence, the free space had to be decremented so that when log_pull_tail gets called, the free space is be adjusted properly. (Did I hear you call that a kludge? well, maybe, but a lot more justifiable than the one I removed). 7. In the gfs2_log_shutdown code, it optionally syncs the log by specifying the PULL parameter to log_write_header. I'm not sure this is necessary anymore. It just seems to me there could be cases where shutdown is called while there are outstanding log buffers. 8. In the (data)buf_lo_before_commit functions, I changed some offset values from being calculated on the fly to being constants. That simplified some code and we might as well let the compiler do the calculation once rather than redoing those cycles at run time. 9. This version has my rewritten databuf_lo_add function. This version is much more like its predecessor, buf_lo_add, which makes it easier to understand. Again, this might not be necessary, but it seems as if this one works as well as the previous one, maybe even better, so I decided to leave it in. 10. In databuf_lo_before_commit, a previous data corruption problem was caused by going off the end of the buffer. The proper solution is to have the proper limit in place, rather than stopping earlier. (Thus my previous attempt to fix it is wrong). If you don't wrap the buffer, you're stopping too early and that causes more log buffer accounting problems. 11. In lops.h there are two new (previously mentioned) constants for figuring out the data offset for the journal buffers. 12. There are also two new functions, buf_limit and databuf_limit to calculate how many entries will fit in the buffer. 13. In function gfs2_meta_wipe, it needs to distinguish between pinned metadata buffers and journaled data buffers for proper journal buffer accounting. It can't use the JDATA gfs2_inode flag because it's sometimes passed the "real" inode and sometimes the "metadata inode" and the inode flags will be random bits in a metadata gfs2_inode. It needs to base its decision on which was passed in. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:47 +01:00
Steven Whitehouse	2840501ac8	[GFS2] Use zero_user_page() in stuffed_readpage() As suggested by Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Robert P. J. Day <rpjday@mindspring.com>	2007-07-09 08:23:45 +01:00
Steven Whitehouse	c4201214cb	[GFS2] Remove bogus '\0' in rgrp.c Not sure how it slipped in, but we don't want it anyway. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:43 +01:00
Robert Peterson	8fb68595d5	[GFS2] Journaled file write/unstuff bug This patch is for bugzilla bug 283162, which uncovered a number of bugs pertaining to writing to files that have the journaled bit on. These bugs happen most often when writing to the meta_fs because the files are always journaled. So operations like gfs2_grow were particularly vulnerable, although many of the problems could be recreated with normal files after setting the journaled bit on. The problems fixed are: -GFS2 wasn't ever writing unstuffed journaled data blocks to their in-place location on disk. Now it does. -If you unmounted too quickly after doing IO to a journaled file, GFS2 was crashing because you would discard a buffer whose bufdata was still on the active items list. GFS2 now deals with this gracefully. -GFS2 was losing track of the bufdata for journaled data blocks, and it wasn't getting freed, causing an error when you tried to unmount the module. GFS2 now frees all the bufdata structures. -There was a memory corruption occurring because GFS2 wrote twice as many log entries for journaled buffers. -It was occasionally trying to write journal headers in buffers that weren't currently mapped. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:40 +01:00
David Teigland	fad59c1390	[DLM] don't require FS flag on all nodes Mask off the recently added DLM_LSFL_FS flag when setting the exflags. This way all the nodes in the lockspace aren't required to have the FS flag set, since we later check that exflags matches among all nodes. Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:38 +01:00
Abhijith Das	d93cfa9884	[GFS2] Fix deallocation issues There were two issues during deallocation of unlinked inodes. The first was relating to the use of a "try" lock which in the case of the inode lock wasn't trying hard enough to deallocate in all circumstances (now changed to a normal glock) and in the case of the iopen lock didn't wait for the demotion of the shared lock before attempting to get the exclusive lock, and thereby sometimes (timing dependent) not completing the deallocation when it should have done. The second issue related to the lack of a way to invalidate dcache entries on remote nodes (now fixed by this patch) which meant that unlinks were taking a long time to return disk space to the fs. By adding some code to invalidate the dcache entries across the cluster for unlinked inodes, that is now fixed. This patch was written jointly by Abhijith Das and Steven Whitehouse. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:36 +01:00
David Teigland	a7a2ff8a95	[GFS2] return conflicts for GETLK We weren't returning the correct result when GETLK found a conflict, which is indicated by userspace passing back a 1. Signed-off-by: Abhijith Das <adas redhat com> Signed-off-by: David Teigland <teigland redhat com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:33 +01:00
David Teigland	d88101d4d8	[GFS2] set plock owner in GETLK info Set the owner field in the plock info sent to userspace for GETLK. Without this, gfs_controld won't correctly see when the GETLK from a process matches one of the process's existing locks. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:31 +01:00
akpm@linux-foundation.org	037bcbb756	[GFS2] gfs2_lookupi() uninitialised var fix fs/gfs2/inode.c: In function 'gfs2_lookupi': fs/gfs2/inode.c:392: warning: 'error' may be used uninitialized in this function Looks like a real bug to me. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:29 +01:00
Steven Whitehouse	c8cdf47937	[GFS2] Recovery for lost unlinked inodes Under certain circumstances its possible (though rather unlikely) that inodes which were unlinked by one node while still open on another might get "lost" in the sense that they don't get deallocated if the node which held the inode open crashed before it was unlinked. This patch adds the recovery code which allows automatic deallocation of the inode if its found during block allocation (the sensible time to look for such inodes since we are scanning the rgrp's bitmaps anyway at this time, so it adds no overhead to do this). Since the inode will have had its i_nlink set to zero, all we need to trigger recovery is a lookup and an iput(), and the normal deallocation code takes care of the rest. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:26 +01:00
Robert Peterson	b35997d448	[GFS2] Can't mount GFS2 file system on AoE device This patch fixes bug 243131: Can't mount GFS2 file system on AoE device. When using AoE devices with lock_nolock, there is no locking table, so gfs2 (and gfs1) uses the superblock s_id. This turns out to be the device name in some cases. In the case of AoE, the device contains a slash, (e.g. "etherd/e1.1p2") which is an invalid character when we try to register the table in sysfs. This patch replaces the "/" with underscore. Rather than add a new variable to the stack, I'm just reusing a (char *) variable that's no longer used: table. This code has been tested on the failing system using a RHEL5 patch. The upstream code was tested by using gfs2_tool sb to interject a "/" into the table name of a clustered gfs2 file system. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:24 +01:00
Steven Whitehouse	e1cc86037b	[GFS2] Fix bug in error path of inode This fixes a bug in the ordering of operations in the error path of createi. Its not valid to do an iput() when holding the inode's glock since the iput() will (in this case) result in delete_inode() being called which needs to grab the lock itself. This was causing the recursive lock checking code to trigger. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:22 +01:00
Steven Whitehouse	ffed8ab342	[GFS2] Fix typo in rename of directories A typo caused us to pass a NULL pointer when renaming directories. It was accidentally introduced in: [GFS2] Clean up inode number handling Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:19 +01:00
Patrick Caulfield	44f487a553	[DLM] variable allocation Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace. This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL. (This updated version of the patch uses gfp_t for ls_allocation.) Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-Off-By: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:17 +01:00
Josef Bacik	292e539e93	[DLM] fix reference counting This is a fix for the patch 021d2ff3a08019260a1dc002793c92d6bf18afb6 I left off a dlm_hold_rsb which causes the box to panic if you try to use debugfs. This patch fixes the problem. Sorry about that, Signed-off-by: Josef Bacik <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:15 +01:00
Steven Whitehouse	4bd91ba181	[GFS2] Add nanosecond timestamp feature This adds a nanosecond timestamp feature to the GFS2 filesystem. Due to the way that the on-disk format works, older filesystems will just appear to have this field set to zero. When mounted by an older version of GFS2, the filesystem will simply ignore the extra fields so that it will again appear to have whole second resolution, so that its trivially backward compatible. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:12 +01:00
Steven Whitehouse	bb8d8a6f54	[GFS2] Fix sign problem in quota/statfs and cleanup _host structures This patch fixes some sign issues which were accidentally introduced into the quota & statfs code during the endianess annotation process. Also included is a general clean up which moves all of the _host structures out of gfs2_ondisk.h (where they should not have been to start with) and into the places where they are actually used (often only one place). Also those _host structures which are not required any more are removed entirely (which is the eventual plan for all of them). The conversion routines from ondisk.c are also moved into the places where they are actually used, which for almost every one, was just one single place, so all those are now static functions. This also cleans up the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__. The net result is a reduction of about 100 lines of code, many functions now marked static plus the bug fixes as mentioned above. For good measure I ran the code through sparse after making these changes to check that there are no warnings generated. This fixes Red Hat bz #239686 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:10 +01:00
Benjamin Marzinski	ddf4b426aa	[GFS2] fix jdata issues This is a patch for the first three issues of RHBZ #238162 The first issue is that when you allocate a new page for a file, it will not start off uptodate. This makes sense, since you haven't written anything to that part of the file yet. Unfortunately, gfs2_pin() checks to make sure that the buffers are uptodate. The solution to this is to mark the buffers uptodate in gfs2_commit_write(), after they have been zeroed out and have the data written into them. I'm pretty confident with this fix, although it's not completely obvious that there is no problem with marking the buffers uptodate here. The second issue is simply that you can try to pin a data buffer that is already on the incore log, and thus, already pinned. This patch checks to see if this buffer is already on the log, and exits databuf_lo_add() if it is, just like buf_lo_add() does. The third issue is that gfs2_log_flush() doesn't do it's block accounting correctly. Both metadata and journaled data are logged, but gfs2_log_flush() only compares the number of metadata blocks with the number of blocks to commit to the ondisk journal. This patch also counts the journaled data blocks. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:08 +01:00
Patrick Caulfield	afb853fb4e	[DLM] fix socket shutdown This patch clears the user_data of active sockets as part of cleanup. This prevents any late-arriving data from trying to add jobs to the work queue while we are tidying up. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-Off-By: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:05 +01:00
Steven Whitehouse	89918647a4	[GFS2] Make the log reserved blocks depend on block size The number of blocks which we reserve in the log at the start of each transaction needs to depends upon the block size since the overhead is related to the number of "pointers" which can be fitted into a single block. This relates to Red Hat bz #240435 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:03 +01:00
Abhijith Das	1990e91765	[GFS2] Quotas non-functional - fix another bug This patch fixes a bug where gfs2 was writing update quota usage information to the wrong location in the quota file. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:01 +01:00
David Teigland	0b7cac0fb0	[DLM] show default protocol Display the initial value of the "protocol" config value in configfs. The default value has always been 0 in the past anyway, so it's always appeared to be correct. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:59 +01:00
David Teigland	9dd592d70b	[DLM] dumping master locks Add a new debugfs file that dumps a compact list of mastered locks. This will be used by a userland daemon to collect state for deadlock detection. Also, for the existing function that prints all lock state, lock the rsb before going through the lock lists since they can be changing in the course of normal dlm activity. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:56 +01:00
David Teigland	8b4021fa43	[DLM] canceling deadlocked lock Add a function that can be used through libdlm by a system daemon to cancel another process's deadlocked lock. A completion ast with EDEADLK is returned to the process waiting for the lock. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:54 +01:00
David Teigland	84d8cd69a8	[DLM] timeout fixes Various fixes related to the new timeout feature: - add_timeout() missed setting TIMEWARN flag on lkb's when the TIMEOUT flag was already set - clear_proc_locks should remove a dead process's locks from the timeout list - the end-of-life calculation for user locks needs to consider that ETIMEDOUT is equivalent to -DLM_ECANCEL - make initial default timewarn_cs config value visible in configfs - change bit position of TIMEOUT_CANCEL flag so it's not copied to a remote master node - set timestamp on remote lkb's so a lock dump will display the time they've been waiting Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:52 +01:00
Steven Whitehouse	b3cab7b9a3	[DLM] Compile fix A one liner fix which got missed from the earlier patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Fabio Massimo Di Nitto <fabbione@ubuntu.com> Cc: David Teigland <teigland@redhat.com>	2007-07-09 08:22:49 +01:00
David Teigland	639aca417d	[DLM] fix compile breakage In the rush to get the previous patch set sent, a compilation bug I fixed shortly before sending somehow got clobbered, probably by a missed quilt refresh or something. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:45 +01:00
David Teigland	8b0e7b2cf3	[DLM] wait for config check during join [6/6] Joining the lockspace should wait for the initial round of inter-node config checks to complete before returning. This way, if there's a configuration mismatch between the joining node and the existing nodes, the join can fail and return an error to the application. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:42 +01:00
David Teigland	79d72b5448	[DLM] fix new_lockspace error exit [5/6] Fix the error path when exiting new_lockspace(). It was kfree'ing the lockspace struct at the end, but that's only valid if it exits before kobject_register occured. After kobject_register we have to let the kobject do the freeing. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:40 +01:00
David Teigland	c85d65e914	[DLM] cancel in conversion deadlock [4/6] When conversion deadlock is detected, cancel the conversion and return EDEADLK to the application. This is a new default behavior where before the dlm would allow the deadlock to exist indefinately. The DLM_LKF_NODLCKWT flag can now be used in a conversion to prevent the dlm from performing conversion deadlock detection/cancelation on it. The DLM_LKF_CONVDEADLK flag can continue to be used as before to tell the dlm to demote the granted mode of the lock being converted if it gets into a conversion deadlock. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:38 +01:00
David Teigland	d7db923ea4	[DLM] dlm_device interface changes [3/6] Change the user/kernel device interface used by libdlm: - Add ability for userspace to check the version of the interface. libdlm can now adapt to different versions of the kernel interface. - Increase the size of the flags passed in a lock request so all possible flags can be used from userspace. - Add an opaque "xid" value for each lock. This "transaction id" will be used later to associate locks with each other during deadlock detection. - Add a "timeout" value for each lock. This is used along with the DLM_LKF_TIMEOUT flag. Also, remove a fragment of unused code in device_read(). This patch requires updating libdlm which is backward compatible with older kernels. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:36 +01:00
David Teigland	3ae1acf93a	[DLM] add lock timeouts and warnings [2/6] New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT flag is set, then the request/conversion will be canceled after waiting the specified number of centiseconds (specified per lock). This feature is only available for locks requested through libdlm (can be enabled for kernel dlm users if there's a use for it.) If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then a warning message will be sent to userspace (using genetlink) after a request/conversion has been waiting for a given number of centiseconds (configurable per node). The time warnings will be used in the future to do deadlock detection in userspace. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:33 +01:00
David Teigland	85e86edf95	[DLM] block scand during recovery [1/6] Don't let dlm_scand run during recovery since it may try to do a resource directory removal while the directory nodes are changing. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:31 +01:00
Josef Bacik	916297aad5	[DLM] keep dlm from panicing when traversing rsb list in debugfs This problem was originally reported against GFS6.1, but the same issue exists in upstream DLM. This patch keeps the rsb iterator assigning under the rsbtbl list lock. Each time we process an rsb we grab a reference to it to make sure it is not freed out from underneath us, and then put it when we get the next rsb in the list or move onto another list. Signed-off-by: Josef Bacik <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:29 +01:00
Abhijith Das	2a87ab0806	[GFS2] Quotas non-functional - fix bug This patch fixes an error in the quota code where a 'struct gfs2_quota_lvb' was being passed to gfs2_adjust_quota() instead of a 'struct gfs2_quota_data'. Also moved 'struct gfs2_quota_lvb' from fs/gfs2/incore.h to include/linux/gfs2_ondisk.h as per Steve's suggestion. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:26 +01:00
Steven Whitehouse	dbb7cae2a3	[GFS2] Clean up inode number handling This patch cleans up the inode number handling code. The main difference is that instead of looking up the inodes using a struct gfs2_inum_host we now use just the no_addr member of this structure. The tests relating to no_formal_ino can then be done by the calling code. This has advantages in that we want to do different things in different code paths if the no_formal_ino doesn't match. In the NFS patch we want to return -ESTALE, but in the ->lookup() path, its a bug in the fs if the no_formal_ino doesn't match and thus we can withdraw in this case. In order to later fix bz #201012, we need to be able to look up an inode without knowing no_formal_ino, as the only information that is known to us is the on-disk location of the inode in question. This patch will also help us to fix bz #236099 at a later date by cleaning up a lot of the code in that area. There are no user visible changes as a result of this patch and there are no changes to the on-disk format either. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:24 +01:00
Steven Whitehouse	41d7db0ab4	[GFS2] Reduce size of struct gdlm_lock This patch removes the completion (which is rather large) from struct gdlm_lock in favour of using the wait_on_bit() functions. We don't need to add any extra fields to the structure to do this, so we save 32 bytes (on x86_64) per structure. This adds up to quite a lot when we may potentially have millions of these lock structures, Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: David Teigland <teigland@redhat.com>	2007-07-09 08:22:21 +01:00
Robert Peterson	cd81a4bac6	[GFS2] Addendum patch 2 for gfs2_grow This addendum patch 2 corrects three things: 1. It fixes a stupid mistake in the previous addendum that broke gfs2. Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html 2. It fixes a problem that Dave Teigland pointed out regarding the external declarations in ops_address.h being in the wrong place. 3. It recasts a couple more %llu printks to (unsigned long long) as requested by Steve Whitehouse. I would have loved to put this all in one revised patch, but there was a rush to get some patches for RHEL5. Therefore, the previous patches were applied to the git tree "as is" and therefore, I'm posting another addendum. Sorry. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:19 +01:00
Nate Diller	0507ecf50f	[GFS2] use zero_user_page Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-07-09 08:22:17 +01:00
Robert Peterson	6c53267f05	[GFS2] Kernel changes to support new gfs2_grow command (part 2) To avoid code redundancy, I separated out the operational "guts" into a new function called read_rindex_entry. Then I made two functions: the closer-to-original gfs2_ri_update (without the special condition checks) and gfs2_ri_update_special that's designed with that condition in mind. (I don't like the name, but if you have a suggestion, I'm all ears). Oh, and there's an added benefit: we don't need all the ugly gotos anymore. ;) This patch has been tested with gfs2_fsck_hellfire (which runs for three and a half hours, btw). Signed-off-By: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:14 +01:00
Robert Peterson	7ae8fa8451	[GFS2] kernel changes to support new gfs2_grow command This is another revision of my gfs2 kernel patch that allows gfs2_grow to function properly. Steve Whitehouse expressed some concerns about the previous patch and I restructured it based on his comments. The previous patch was doing the statfs_change at file close time, under its own transaction. The current patch does the statfs_change inside the gfs2_commit_write function, which keeps it under the umbrella of the inode transaction. I can't call ri_update to re-read the rindex file during the transaction because the transaction may have outstanding unwritten buffers attached to the rgrps that would be otherwise blown away. So instead, I created a new function, gfs2_ri_total, that will re-read the rindex file just to total the file system space for the sake of the statfs_change. The ri_update will happen later, when gfs2 realizes the version number has changed, as it happened before my patch. Since the statfs_change is happening at write_commit time and there may be multiple writes to the rindex file for one grow operation. So one consequence of this restructuring is that instead of getting one kernel message to indicate the change, you may see several. For example, before when you did a gfs2_grow, you'd get a single message like: GFS2: File system extended by 247876 blocks (968MB) Now you get something like: GFS2: File system extended by 207896 blocks (812MB) GFS2: File system extended by 39980 blocks (156MB) This version has also been successfully run against the hours-long "gfs2_fsck_hellfire" test that does several gfs2_grow and gfs2_fsck while interjecting file system damage. It does this repeatedly under a variety Resource Group conditions. Signed-off-By: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:12 +01:00
Satyam Sharma	3168b0780d	[DLM] fix a couple of races Fix two races in fs/dlm/config.c: (1) Grab the configfs subsystem semaphore before calling config_group_find_obj() in get_space(). This solves a potential race between get_space() and concurrent mkdir(2) or rmdir(2). (2) Grab a reference on the found config_item _while_ holding the configfs subsystem semaphore in get_comm(), and not after it. This solves a potential race between get_comm() and concurrent rmdir(2). Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:10 +01:00
Benjamin Marzinski	b524fe646c	[GFS2] flush the glock completely in inode_go_sync Fix for bz #231910 When filemap_fdatawrite() is called on the inode mapping in data=ordered mode, it will add the glock to the log. In inode_go_sync(), if you do the gfs2_log_flush() before this, after the filemap_fdatawrite() call, the glock and its associated data buffers will be on the log again. This means you can demote a lock from exclusive, without having it flushed from the log. The attached patch simply moves the gfs2_log_flush up to after the filemap_fdatawrite() call. Originally, I tried moving the gfs2_log_flush to after gfs2_meta_sync(), but that caused me to trip the following assert. GFS2: fsid=cypher-36:test.0: fatal: assertion "!buffer_busy(bh)" failed GFS2: fsid=cypher-36:test.0: function = gfs2_ail_empty_gl, file = fs/gfs2/glops.c, line = 61 It appears that gfs2_log_flush() puts some of the glocks buffers in the busy state and the filemap_fdatawrite() call is necessary to flush them. This makes me worry slightly that a related problem could happen because of moving the gfs2_log_flush() after the initial filemap_fdatawrite(), but I assume that gfs2_ail_empty_gl() would catch that case as well. Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:22:07 +01:00
Linus Torvalds	1e5de2837c	Fix permission checking for the new utimensat() system call Commit `1c710c896e` added the utimensat() system call, but didn't handle the case of checking for the writability of the target right, when the target was a file descriptor, not a filename. We cannot use vfs_permission(MAY_WRITE) for that case, and need to simply check whether the file descriptor is writable. The oops from using the wrong function was noticed and narrowed down by Markus Trippelsdorf. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-08 12:02:55 -07:00
Steve French	3870253efb	[CIFS] more whitespace fixes Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-08 15:40:40 +00:00
Adrian Bunk	95511ad434	DLM must depend on SYSFS The dependency of DLM on SYSFS got lost in commit `6ed7257b46` resulting in the following compile error with CONFIG_DLM=y, CONFIG_SYSFS=n: <-- snip --> ... LD .tmp_vmlinux1 fs/built-in.o: In function `dlm_lockspace_init': /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/dlm/lockspace.c:231: undefined reference to `kernel_subsys' fs/built-in.o: In function `configfs_init': /home/bunk/linux/kernel-2.6/linux-2.6.22-rc6-mm1/fs/configfs/mount.c:143: undefined reference to `kernel_subsys' make[1]: *** [.tmp_vmlinux1] Error 1 <-- snip --> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-07 14:17:43 -07:00
Steve French	790fe579f5	[CIFS] more whitespace cleanup Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-07 19:25:05 +00:00
Steve French	6dc0f87e35	[CIFS] whitespace cleanup Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-06 23:13:06 +00:00
Steve French	79a58d1f60	[CIFS] whitespace cleanup checkpatch.pl redux Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-06 22:44:50 +00:00
Jeff	d20acd09e3	[CIFS] ipv6 support no longer experimental Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-06 21:13:08 +00:00
Jeff	38c10a1ddb	[CIFS] Mount should fail if server signing off but client mount option requires it Currently, if mount with a signing-enabled sec= option (e.g. sec=ntlmi), the kernel does a warning printk if the server doesn't support signing, and then proceeds without signatures. This is probably OK for people that think to look at the ring buffer, but seems wrong to me. If someone explicitly requests signing, we should error out if that request can't be satisfied. They can then reattempt the mount without signing if that's ok. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-07-06 21:10:07 +00:00
Michael Ellerman	ef7320edb1	Fix elf_core_dump() when writing arch specific notes (spu coredumps) elf_core_dump() supports dumping arch specific ELF notes, via the #define ELF_CORE_WRITE_EXTRA_NOTES. Currently the only user of this is the powerpc spu coredump code. There is a bug in the handling of foffset WRT the arch notes, which causes us to erroneously increment foffset by the size of the arch notes, leaving a block of zeroes in the file, and causing all subsequent data in the file to be at <supposed position> + <arch note size>. eg: LOAD 0x050000 0x00100000 0x00000000 0x20000 0x20000 R E 0x10000 Tells us we should have a chunk of data at 0x50000. The truth is the data is at 0x90dbc = 0x50000 + 0x40dbc (the size of the arch notes). This bug prevents gdb from reading the core file correctly. The simplest fix is to simply remember the size of the arch notes, and add it to foffset after we've written the arch notes. The only drawback is that if the arch code doesn't write as many bytes as it said it would, we end up with a broken core dump again. For now I think that's a reasonable requirement. Tested on a Cell blade, gdb no longer complains about the core file being bogus. While I'm here I should point out that the spu coredump code does not work if we're dumping to a pipe - we'll have to wait for 23 to fix that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-06 10:23:43 -07:00
David Woodhouse	e2baf4ed16	[JFFS2] Fix readinode failure when read_dnode() detects CRC failure. We should have stopped returning 1 from read_dnode() to indicate failure. We can just mark the damn thing obsolete immediately. But I missed a case where we don't. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-07-04 10:24:29 -04:00
Zach Brown	fcb82f8835	dio: remove bogus refcounting BUG_ON Badari Pulavarty reported a case of this BUG_ON is triggering during testing. It's completely bogus and should be removed. It's trying to notice if we left references to the dio hanging around in the sync case. They should have been dropped as IO completed while this path was in dio_await_completion(). This condition will also be checked, via some twisty logic, by the BUG_ON(ret != -EIOCBQUEUED) a few lines lower. So to start this BUG_ON() is redundant. More fatally, it's dereferencing dio-> after having dropped its reference. It's only safe to dereference the dio after releasing the lock if the final reference was just dropped. Another CPU might free the dio in bio completion and reuse the memory after this path drops the dio lock but before the BUG_ON() is evaluated. This patch passed aio+dio regression unit tests and aio-stress on ext3. Signed-off-by: Zach Brown <zach.brown@oracle.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-03 18:23:23 -07:00
Steve French	d38d8c74c7	[CIFS] whitespace fixes This changeset brought to you ... by patchcheck.pl Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-06-28 19:44:13 +00:00
Steve French	762e5ab77c	[CIFS] Fix sign mount option and sign proc config setting We were checking the wrong (old) global variable to determine whether to override server and force signing on the SMB connection. Acked-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-06-28 18:41:42 +00:00
David Woodhouse	edd5cd4a94	Introduce fixed sys_sync_file_range2() syscall, implement on PowerPC and ARM Not all the world is an i386. Many architectures need 64-bit arguments to be aligned in suitable pairs of registers, and the original sys_sync_file_range(int, loff_t, loff_t, int) was therefore wasting an argument register for padding after the first integer. Since we don't normally have more than 6 arguments for system calls, that left no room for the final argument on some architectures. Fix this by introducing sys_sync_file_range2(int, int, loff_t, loff_t) which all fits nicely. In fact, ARM already had that, but called it sys_arm_sync_file_range. Move it to fs/sync.c and rename it, then implement the needed compatibility routine. And stop the missing syscall check from bitching about the absence of sys_sync_file_range() if we've implemented sys_sync_file_range2() instead. Tested on PPC32 and with 32-bit and 64-bit userspace on PPC64. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-28 11:38:30 -07:00
Andrew Morton	ddc80bd781	ext2: fix return of uninitialised variable gcc correctly says fs/ext2/super.c: In function 'ext2_remount': fs/ext2/super.c:1055: warning: 'err' may be used uninitialized in this function Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-28 11:38:29 -07:00
Davide Libenzi	f8738c5c52	avoid spurious POLLIN returns in signalfd The new code in kernel/signal.c does not allow fetching private signals from another task. This patch avoid spurious POLLIN returns from a signalfd poll(2) operation. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-28 11:34:54 -07:00
Michael Halcrow	d4c5cdb3e0	zero out last page for llseek/write When one llseek's past the end of the file and then writes, every page past the previous end of the file should be cleared. Trevor found that the code, as is, does not assure that the very last page is always cleared. This patch takes care of that. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-28 11:34:53 -07:00
Michael Halcrow	e10f281bca	eCryptfs: initialize crypt_stat in setattr Recent changes in eCryptfs have made it possible to get to ecryptfs_setattr() with an uninitialized crypt_stat struct. This results in a wide and colorful variety of unpleasantries. This patch properly initializes the crypt_stat structure in ecryptfs_setattr() when it is necessary to do so. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-28 11:34:53 -07:00
Michael Halcrow	240e2df5c7	eCryptfs: fix write zeros behavior This patch fixes the processes involved in wiping regions of the data during truncate and write events, fixing a kernel hang in 2.6.22-rc4 while assuring that zero values are written out to the appropriate locations during events in which the i_size will change. The range passed to ecryptfs_truncate() from ecryptfs_prepare_write() includes the page that is the object of ecryptfs_prepare_write(). This leads to a kernel hang as read_cache_page() is executed on the same page in the ecryptfs_truncate() execution path. This patch remedies this by limiting the range passed to ecryptfs_truncate() so as to exclude the page that is the object of ecryptfs_prepare_write(); it also adds code to ecryptfs_prepare_write() to zero out the region of its own page when writing past the i_size position. This patch also modifies ecryptfs_truncate() so that when a file is truncated to a smaller size, eCryptfs will zero out the contents of the new last page from the new size through to the end of the last page. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-28 11:34:53 -07:00
Steve French	467a8f8d48	[CIFS] whitespace cleanup Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-06-27 22:41:32 +00:00
Jeff	5d9c720678	[CIFS] Do not allow signals in cifs_demultiplex_thread Switch from send_sig to force_sig and do not allow signal for this background thread (the signal is needed to wakeup the thread when blocked in the network stack). Signed-off-by: Jeff Layton <jlayton@readhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-06-25 22:16:35 +00:00
Steve French	ffdd6e4d16	[CIFS] fix whitespace More whitespace problems found by checkpatch Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-06-24 21:15:44 +00:00
Steve French	75865f8cc8	[CIFS] Add in some missing flags and cifs README and TODO corrections Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-06-24 18:30:48 +00:00
Kirill Korotaev	e5d2861f31	ext4: lost brelse in ext4_read_inode() One of error path in ext4_read_inode() leaks bh since brelse is forgoten. Signed-off-by: Kirill Korotaev <dev@openvz.org> Acked-by: Vasily Averin <vvs@sw.ru> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-24 08:59:12 -07:00
Kirill Korotaev	e4a10a362c	ext3: lost brelse in ext3_read_inode() One of error path in ext3_read_inode() leaks bh since brelse is forgoten. Signed-off-by: Kirill Korotaev <dev@openvz.org> Acked-by: Vasily Averin <vvs@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-24 08:59:12 -07:00
Carsten Otte	266f5aa097	ext2: disallow setting xip on remount Yan Zheng pointed out that ext2_remount lacks checking if -o xip should be enabled or not. This patch checks for presence of direct_access on the backing block device and if the blocksize meets the requirements. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-24 08:59:12 -07:00
Christoph Hellwig	700716c846	[XFS] s/memclear_highpage_flush/zero_user_page/ SGI-PV: 957103 SGI-Modid: xfs-linux-melb:xfs-kern:28678a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-06-19 15:20:31 +10:00
Eric W. Biederman	9d66586f77	shm: fix the filename of hugetlb sysv shared memory Some user space tools need to identify SYSV shared memory when examining /proc/<pid>/maps. To do so they look for a block device with major zero, a dentry named SYSV<sysv key>, and having the minor of the internal sysv shared memory kernel mount. To help these tools and to make it easier for people just browsing /proc/<pid>/maps this patch modifies hugetlb sysv shared memory to use the SYSV<key> dentry naming convention. User space tools will still have to be aware that hugetlb sysv shared memory lives on a different internal kernel mount and so has a different block device minor number from the rest of sysv shared memory. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Albert Cahalan <acahalan@gmail.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-16 13:16:16 -07:00
Jan Kara	74584ae509	udf: fix possible leakage of blocks We have to take care that when we call udf_discard_prealloc() from udf_clear_inode() we have to write inode ourselves afterwards (otherwise, some changes might be lost leading to leakage of blocks, use of free blocks or improperly aligned extents). Also udf_discard_prealloc() does two different things - it removes preallocated blocks and truncates the last extent to exactly match i_size. We move the latter functionality to udf_truncate_tail_extent(), call udf_discard_prealloc() when last reference to a file is dropped and call udf_truncate_tail_extent() when inode is being removed from inode cache (udf_clear_inode() call). We cannot call udf_truncate_tail_extent() earlier as subsequent open+write would find the last block of the file mapped and happily write to the end of it, although the last extent says it's shorter. [akpm@linux-foundation.org: Make checkpatch.pl happier] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Sandeen <sandeen@sandeen.net> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-16 13:16:16 -07:00
Alexey Dobriyan	edad01e2a1	fuse: ->fs_flags fixlet fs/fuse/inode.c:658:3: error: Initializer entry defined twice fs/fuse/inode.c:661:3: also defined here Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-16 13:16:15 -07:00
Jens Axboe	02676e5aee	splice: only check do_wakeup in splice_to_pipe() for a real pipe We only ever set do_wakeup to non-zero if the pipe has an inode backing, so it's pointless to check outside the pipe->inode check. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-06-15 13:16:13 +02:00
Jens Axboe	00de00bdad	splice: fix leak of pages on short splice to pipe If the destination pipe is full and we already transferred data, we break out instead of waiting for more pipe room. The exit logic looks at spd->nr_pages to see if we moved everything inside the spd container, but we decrement that variable in the loop to decide when spd has emptied. Instead we want to compare to the original page count in the spd, so cache that in a local variable. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-06-15 13:14:22 +02:00
Jens Axboe	17ee4f49ab	splice: adjust balance_dirty_pages_ratelimited() call As we have potentially dirtied more than 1 page, we should indicate as such to the dirty page balancing. So call balance_dirty_pages_ratelimited_nr() and pass in the approximate number of pages we dirtied. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-06-15 13:10:37 +02:00
Dave Kleikamp	288e4d838d	JFS: Update print_hex_dump() syntax Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>	2007-06-13 10:17:50 -05:00
Tejun Heo	dd14cbc994	sysfs: fix race condition around sd->s_dentry, take#2 Allowing attribute and symlink dentries to be reclaimed means sd->s_dentry can change dynamically. However, updates to the field are unsynchronized leading to race conditions. This patch adds sysfs_lock and use it to synchronize updates to sd->s_dentry. Due to the locking around ->d_iput, the check in sysfs_drop_dentry() is complex. sysfs_lock only protect sd->s_dentry pointer itself. The validity of the dentry is protected by dcache_lock, so whether dentry is alive or not can only be tested while holding both locks. This is minimal backport of sysfs_drop_dentry() rewrite in devel branch. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-06-12 16:08:47 -07:00
Tejun Heo	6aa054aadf	sysfs: fix condition check in sysfs_drop_dentry() The condition check doesn't make much sense as it basically always succeeds. This causes NULL dereferencing on certain cases. It seems that parentheses are put in the wrong place. Fix it. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-06-12 16:08:46 -07:00
Eric Sandeen	dc351252b3	sysfs: store sysfs inode nrs in s_ino to avoid readdir oopses Backport of ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/broken-out/gregkh-driver-sysfs-allocate-inode-number-using-ida.patch For regular files in sysfs, sysfs_readdir wants to traverse sysfs_dirent->s_dentry->d_inode->i_ino to get to the inode number. But, the dentry can be reclaimed under memory pressure, and there is no synchronization with readdir. This patch follows Tejun's scheme of allocating and storing an inode number in the new s_ino member of a sysfs_dirent, when dirents are created, and retrieving it from there for readdir, so that the pointer chain doesn't have to be traversed. Tejun's upstream patch uses a new-ish "ida" allocator which brings along some extra complexity; this -stable patch has a brain-dead incrementing counter which does not guarantee uniqueness, but because sysfs doesn't hash inodes as iunique expects, uniqueness wasn't guaranteed today anyway. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-06-12 16:08:46 -07:00
Linus Torvalds	3e2ce4dae9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] CIFS should honour umask [CIFS] Missing flag on negprot needed for some servers to force packet signing [CIFS] whitespace cleanup part 2 [CIFS] whitespace cleanup [CIFS] fix mempool destroy done in wrong order in cifs error path [CIFS] typo in previous patch [CIFS] Fix oops on failed cifs mount (in kthread_stop)	2007-06-11 11:39:25 -07:00
Linus Torvalds	5212c555be	Merge branch 'splice-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block * 'splice-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block: splice: __generic_file_splice_read: fix read/truncate race splice: __generic_file_splice_read: fix i_size_read() length checks splice: move balance_dirty_pages_ratelimited() outside of splice actor pipe: move pipe_inode_info structure decleration up before it's used splice: remove do_splice_direct() symbol export splice: move inode size check into generic_file_splice_read()	2007-06-11 11:31:05 -07:00
Paul Mundt	dd9505879c	fs: hugetlbfs: Disable for shnommu. SH can turn CONFIG_MMU on and off, don't let us get to a state where hugetlbfs/hugetlbpage gets built when building for nommu. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-06-11 15:35:34 +09:00
Linus Torvalds	845a2fdcbd	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: Fix invalid assertion during write on 64k pages ocfs2: Fix masklog breakage	2007-06-08 19:44:16 -07:00
Greg Ungerer	c287ef1ff9	nommu: report correct errno in message Report the correct errno for out of memory debug output in binfmt_flat.c Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Greg Ungerer <gerg@uclinux.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-08 17:23:32 -07:00
Steve French	3ce53fc4c5	[CIFS] CIFS should honour umask This patch makes CIFS honour a process' umask like other filesystems. Of course the server is still free to munge the permissions if it wants to; but the client will send the "right" permissions to begin with. A few caveats: 1) It only applies to filesystems that have CAP_UNIX (aka support unix extensions) 2) It applies the correct mode to the follow up CIFSSMBUnixSetPerms() after remote creation When mode to CIFS/NTFS ACL mapping is complete we can do the same thing for that case for servers which do not support the Unix Extensions. Signed-off-by: Matt Keenen <matt@opcode-solutions.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-06-08 14:55:14 +00:00
Jens Axboe	620a324b74	splice: __generic_file_splice_read: fix read/truncate race Original patch and description from Neil Brown <neilb@suse.de>, merged and adapted to splice branch by me. Neils text follows: __generic_file_splice_read() currently samples the i_size at the start and doesn't do so again unless it needs to call ->readpage to load a page. After ->readpage it has to re-sample i_size as a truncate may have caused that page to be filled with zeros, and the read() call should not see these. However there are other activities that might cause ->readpage to be called on a page between the time that __generic_file_splice_read() samples i_size and when it finds that it has an uptodate page. These include at least read-ahead and possibly another thread performing a read So we must sample i_size after it has an uptodate page. Thus the current sampling at the start and after a read can be replaced with a sampling before page addition into spd. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-06-08 08:34:11 +02:00
Hugh Dickins	475ecade68	splice: __generic_file_splice_read: fix i_size_read() length checks __generic_file_splice_read's partial page check, at eof after readpage, not only got its calculations wrong, but also reused the loff variable: causing data corruption when splicing from a non-0 offset in the file's last page (revealed by ext2 -b 1024 testing on a loop of a tmpfs file). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-06-08 08:34:05 +02:00
Jens Axboe	20d698db67	splice: move balance_dirty_pages_ratelimited() outside of splice actor I've seen inode related deadlocks, so move this call outside of the actor itself, which may hold the inode lock. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-06-08 08:33:59 +02:00
Jens Axboe	267adc3e66	splice: remove do_splice_direct() symbol export It's only supposed to be used by do_sendfile(), which is never modular. So kill the export. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-06-08 08:33:41 +02:00
Jens Axboe	d366d39885	splice: move inode size check into generic_file_splice_read() Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-06-08 08:32:38 +02:00
Bryan Wu	85f6038f21	RAMFS NOMMU: missed POSIX UID/GID inode attribute checking This bug was caught by LTP testcase fchmod06 on Blackfin platform. In the manpage of fchmod, "EPERM: The effective UID does not match the owner of the file, and the process is not privileged (Linux: it does not have the CAP_FOWNER capability)." But the ramfs nommu code missed the inode_change_ok POSIX UID/GID verification. This patch fixed this. Signed-off-by: Bryan Wu <bryan.wu@analog.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-07 17:11:13 -07:00
Mark Fasheh	eeb47d1234	ocfs2: Fix invalid assertion during write on 64k pages The write path code intends to bug if a math error (or unhandled case) results in a write outside of the current cluster boundaries. The actual BUG_ON() statements however are incorrect, leading to a crash on kernels with 64k page size. Fix those by checking against the right variables. Also, move the assertions higher up within the functions so that they trip before the code starts to mark buffers. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-06-06 16:42:03 -07:00
Tiger Yang	59be7dc97b	ocfs2: Fix masklog breakage Some of the sysfs changes inadvertantly broke the simple runtime debug log filtering employed in ocfs2. Fix this by properly exporting the masklog category filter names. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-06-06 16:41:08 -07:00
Dave Kleikamp	209e101bf4	JFS: use print_hex_dump() rather than private dump_mem() function Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>	2007-06-06 16:30:17 -05:00
Dave Kleikamp	f720e3ba55	JFS: Whitespace cleanup and remove some dead code Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>	2007-06-06 15:28:35 -05:00
Yehuda Sadeh Weinraub	100c1ddc98	[CIFS] Missing flag on negprot needed for some servers to force packet signing A related signature issue that I came across. There's a bug in win2k that when NT error codes are not negotiated, the server doesn't response that signatures are mandatory. Since there's (currently) no way turn on signatures in such case, I had to force NT error codes, so that this bug will not occur Signed-off-by: Yehuda Sadeh Weinraub <Yehuda.Sadeh@expand.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-06-05 21:31:16 +00:00
Steve French	221601c3d1	[CIFS] whitespace cleanup part 2 Various coding style problems found by running the new checkpatch.pl script against fs/cifs. 3 more files fixed up. Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-06-05 20:35:06 +00:00
Steve French	5fdae1f681	[CIFS] whitespace cleanup Various coding style problems found by running fs/cifs against the new checkpatch.pl script. Since there were too many to fit in one patch. Updated the first four files. Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-06-05 18:30:44 +00:00
Linus Torvalds	ec4883b015	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: [JFFS2] Fix obsoletion of metadata nodes in jffs2_add_tn_to_tree() [MTD] Fix error checking after get_mtd_device() in get_sb_mtd functions [JFFS2] Fix buffer length calculations in jffs2_get_inode_nodes() [JFFS2] Fix potential memory leak of dead xattrs on unmount. [JFFS2] Fix BUG() caused by failing to discard xattrs on deleted files. [MTD] generalise the handling of MTD-specific superblocks [MTD] [MAPS] don't force uclinux mtd map to be root dev	2007-06-04 17:54:09 -07:00
Andrew Morton	78ae87c3cd	vanishing ioctl handler debugging We've had several reoprts of the CPU jumping to 0x00000000 is do_ioctl(). I assume that there's a race and someone is zeroing out the ioctl handler while this CPU waits for the lock_kernel(). The patch adds code to detect this, then emits stuff which will hopefuly lead us to the culprit. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-04 13:25:10 -07:00
Akinobu Mita	e6985c7f68	[CIFS] fix mempool destroy done in wrong order in cifs error path Slab cache used as memory pool can not be destroyed before the memory pool destruction. Because the memory pool still holds some objects and kmem_cache_destroy() says "Can't free all objects". Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-06-04 16:14:59 +00:00
David Woodhouse	0477d24e2a	[JFFS2] Fix obsoletion of metadata nodes in jffs2_add_tn_to_tree() We should keep the mdata node with higher version number, not just the one we happen to find latest. Doh. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-06-01 20:04:43 +01:00
Yoann Padioleau	f834368564	parse errors in ifdefs Fix various bits of obviously-busted code which we're not happening to compile, due to ifdefs. Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Jan Kara <jack@ucw.cz> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-01 08:18:28 -07:00
Jan Kara	85d71244f0	Fix possible UDF data corruption update_next_aext() could possibly rewrite values in elen and eloc, possibly leading to data corruption when rewriting a file. Use temporary variables instead. Also advance cur_epos as it can also point to an indirect extent pointer. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-06-01 08:18:27 -07:00
Artem Bityutskiy	ea55d30798	[JFFS2] Fix buffer length calculations in jffs2_get_inode_nodes() If we have already read enough bytes, no need to call read_more(). Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-06-01 13:20:29 +01:00
Alex Tomas	315054f023	When ext4_ext_insert_extent() fails to insert new blocks we should free just the allocated blocks. Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-05-31 16:20:15 -04:00
Amit Arora	25d14f983f	ext4: Extent overlap bugfix This patch adds a check for overlap of extents and cuts short the new extent to be inserted, if there is a chance of overlap. Signed-off-by: Amit Arora <aarora@in.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-05-31 16:20:15 -04:00
Mingming Cao	8a9dc94498	Remove unnecessary exported symbols. Signed-Off-By: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-05-31 16:20:15 -04:00
Dave Kleikamp	8c55e20411	EXT4: Fix whitespace Replace a lot of spaces with tabs Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-05-31 16:20:14 -04:00
Andrew Morton	00c541eae7	afs: needs sched.h mips: fs/afs/flock.c: In function `afs_lock_may_be_available': fs/afs/flock.c:55: error: dereferencing pointer to incomplete type fs/afs/flock.c: In function `afs_lock_work': fs/afs/flock.c:84: error: dereferencing pointer to incomplete type fs/afs/flock.c:89: error: dereferencing pointer to incomplete type fs/afs/flock.c:109: error: dereferencing pointer to incomplete type fs/afs/flock.c:135: error: dereferencing pointer to incomplete type fs/afs/flock.c:143: error: dereferencing pointer to incomplete type fs/afs/flock.c:158: error: dereferencing pointer to incomplete type fs/afs/flock.c:161: error: dereferencing pointer to incomplete type fs/afs/flock.c:179: error: `TASK_UNINTERRUPTIBLE' undeclared (first use in this function) fs/afs/flock.c:179: error: (Each undeclared identifier is reported only once fs/afs/flock.c:179: error: for each function it appears in.) fs/afs/flock.c:179: error: `TASK_INTERRUPTIBLE' undeclared (first use in this function) fs/afs/flock.c:182: error: dereferencing pointer to incomplete type Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-31 07:58:14 -07:00
Andrew Morton	1fc799e1b4	ntfs_init_locked_inode(): fix array indexing Local variable `i' is a byte-counter. Don't use it as an index into an array of le32's. Reported-by: "young dave" <hidave.darkstar@gmail.com> Cc: "Christoph Lameter" <clameter@sgi.com> Acked-by: Anton Altaparmakov <aia21@cantab.net> Cc: <stable@kernel.org> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-31 07:58:13 -07:00
Bryan Wu	3f0a6766e0	a bug in ramfs_nommu_resize function, passing old size to vmtruncate It should be pass "newsize" to vmtruncate function to modify the inode->i_size, while the old size is passed to vmtruncate. This bug was caught by LTP truncate test case on Blackfin platform. After it was fixed, the LTP truncate test case passed. Signed-off-by: Bryan Wu <bryan.wu@analog.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-30 20:54:07 -07:00
Trond Myklebust	b4946ffb18	NFS: Fix a refcount leakage in O_DIRECT The current code is leaking a reference to dreq->kref when the calls to nfs_direct_read_schedule() and nfs_direct_write_schedule() return an error. This patch moves the call to kref_put() from nfs_direct_wait() back into nfs_direct_read() and nfs_direct_write() (which are the functions that actually took the reference in the first place) fixing the leak. Thanks to Denis V. Lunev for spotting the bug and proposing the original fix. Acked-by: Denis V. Lunev <dlunev@gmail.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-30 16:26:01 -04:00
David Chinner	df3c724426	[XFS] Write at EOF may not update filesize correctly. The recent fix for preventing NULL files from being left around does not update the file size corectly in all cases. The missing case is a write extending the file that does not need to allocate a block. In that case we used a read mapping of the extent which forced the use of the read I/O completion handler instead of the write I/O completion handle. Hence the file size was not updated on I/O completion. SGI-PV: 965068 SGI-Modid: xfs-linux-melb:xfs-kern:28657a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Nathan Scott <nscott@aconex.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-29 18:15:17 +10:00
Hugh Dickins	f4d43bd579	fix compat console unimap regression Why is it that since the `2f1a2ccb9c` console UTF-8 fixes went into 2.6.22-rc1, the PowerMac G5 shows only inverse video question marks for the text on tty2-6? whereas tty1 is fine, and so is x86. No fault of that patch: by removing the old fallback behaviour, it reveals that 32-bit setfont running on 64-bit kernels has only really worked on the current console, the rest getting faked by that inadequate fallback. Bring the compat do_unimap_ioctl into line with the main one: PIO_UNIMAP and GIO_UNIMAP apply to the specified tty, not redirected to fg_console. Use the same checks, and most particularly, remember to check access_ok: con_set_unimap and con_get_unimap are using __get_user and __put_user. And the compat vt_check should ask for the same capability as the main one, CAP_SYS_TTY_CONFIG rather than CAP_SYS_ADMIN. Added in vt_ioctl's vc_cons_allocated check for safety, though failure may well be impossible. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-25 17:37:46 -07:00
Christoph Hellwig	d9b08b9efe	[PATCH] ocfs2: use generic_segment_checks Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-25 11:06:37 -07:00
Mark Fasheh	8fccfc829a	ocfs2: fix inode leak We weren't cleaning up our inode reference on error in ocfs2_reserve_local_alloc_bits(). Add a check for error return and iput() if need be. Move the code to set the alloc context inode info to the end of the function so we don't have any possibility of passing back a bad pointer. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-25 11:00:46 -07:00
Nate Diller	5c3c6bb770	[PATCH] ocfs2: use zero_user_page Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-25 11:00:39 -07:00
Mark Fasheh	1024c902ab	ocfs2: unmap_mapping_range() in ocfs2_truncate() We weren't calling this before, but since ocfs2 handles the entire truncate operation, we should. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-25 11:00:31 -07:00
Mark Fasheh	e9dfc0b2bc	ocfs2: trylock in ocfs2_readpage() Similarly to the page lock / cluster lock inversion in ocfs2_readpage, we can deadlock on ip_alloc_sem. We can down_read_trylock() instead and just return AOP_TRUNCATED_PAGE if the operation fails. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2007-05-25 11:00:23 -07:00
Linus Torvalds	d333fc8d30	Merge branch 'fixes' of git://git.linux-nfs.org/pub/linux/nfs-2.6 * 'fixes' of git://git.linux-nfs.org/pub/linux/nfs-2.6: NFS: Fix nfs_direct_dirty_pages() NFS: Fix handful of compiler warnings in direct.c NFS: Avoid a deadlock situation on write	2007-05-24 09:17:12 -07:00
Trond Myklebust	d4a8f3677f	NFS: Fix nfs_direct_dirty_pages() We only need to dirty the pages that were actually read in. Also convert nfs_direct_dirty_pages() to call set_page_dirty() instead of set_page_dirty_lock(). A call to lock_page() is unacceptable in an rpciod callback function. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-24 11:18:18 -04:00
Chuck Lever	749e146e01	NFS: Fix handful of compiler warnings in direct.c This patch fixes a couple of signage issues that were causing an Oops when running the LTP diotest4 test. get_user_pages() returns a signed error, hence we need to be careful when comparing with the unsigned number of pages from data->npages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-24 10:44:20 -04:00
Trond Myklebust	7fe7f8487a	NFS: Avoid a deadlock situation on write When processes are allowed to attempt to lock a non-contiguous range of nfs write requests, it is possible for generic_writepages to 'wrap round' the address space, and call writepage() on a request that is already locked by the same process. We avoid the deadlock by checking if the page index is contiguous with the list of nfs write requests that is already held in our nfs_pageio_descriptor prior to attempting to lock a new request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-24 10:44:20 -04:00
Michael Halcrow	53a2731f93	eCryptfs: delay writing 0's after llseek until write Delay writing 0's out in eCryptfs after a seek past the end of the file until data is actually written. http://www.opengroup.org/onlinepubs/009695399/functions/lseek.html ``The lseek() function shall not, by itself, extend the size of a file.'' Without this fix, applications that lseek() past the end of the file without writing will experience unexpected behavior. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-23 20:14:15 -07:00
Davi Arnaut	b3762bfc8d	signalfd: retrieve multiple signals with one read() call Gathering signals in bulk enables server applications to drain a signal queue (almost full of realtime signals) more efficiently by reducing the syscall and file look-up overhead. Very similar to the sigtimedwait4() call described by Niels Provos, Chuck Lever, and Stephen Tweedie in a paper entitled "Analyzing the Overload Behavior of a Simple Web Server". The paper lists more details and advantages. Signed-off-by: Davi E. M. Arnaut <davi@haxent.com.br> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-23 20:14:14 -07:00
Miklos Szeredi	ead5f0b5fa	fuse: delete inode on drop When inode is dropped (no more references) delete it from cache. There's not much point in keeping it cached, when a new lookup will refresh the attributes anyway. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-23 20:14:13 -07:00
Miklos Szeredi	889f784831	fuse: generic_write_checks() for direct_io This fixes O_APPEND in direct IO mode. Also checks writes against file size limits, notably rlimits. Reported by Greg Bruno. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-23 20:14:13 -07:00
Christoph Hellwig	492c8b332e	uselib: add missing MNT_NOEXEC check We don't allow loading ELF shared library from noexec points so the same should apply to sys_uselib aswell. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Ulrich Drepper <drepper@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-23 20:14:13 -07:00
David Woodhouse	5a1b639148	Missing 'const' from reiserfs MIN_KEY declaration. In stree.c, MIN_KEY is declared const. The extern declaration in dir.c doesn't match... Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-23 20:14:13 -07:00
Badari Pulavarty	6087b2dab2	optimize compat_core_sys_select() by a using stack space for small fd sets Optimize select by a using stack space for small fd sets. core_sys_select() already has this optimization. This is for compat version. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-23 20:14:12 -07:00
Miklos Szeredi	b9ba347f27	fuse: fix mknod of regular file The wrong lookup flag was tested in ->create() causing havoc (error or Oops) when a regular file was created with mknod() in a fuse filesystem. Thanks to J. Cameijo Cerdeira for the report. Kernels 2.6.18 onward are affected. Please apply to -stable as well. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-23 20:14:11 -07:00
Steve French	f7f7c31c98	[CIFS] typo in previous patch (also fixed missing space after if) Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-05-24 02:29:51 +00:00
Steve French	28356a1679	[CIFS] Fix oops on failed cifs mount (in kthread_stop) If the cifs demultiplex thread wakes up and exits (zeroing server->tsk) before kthread_stop is called, the cifs_mount code could pass a null pointer to kthread_stop Thanks to akpm, Dave Young and Shaggy for suggesting earlier versions of this patch. CC: akpm@linux-foundatior.org Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2007-05-23 14:45:36 +00:00
Linus Torvalds	cdb7532f7b	Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: sh: Fix dreamcast build for IRQ changes. sh: Fix clock multiplier on SH7722. sh: Wire up kdump crash kernel exec in die(). sh: sr.bl toggling around idle sleep. sh: disable genrtc support. fs: Kill sh dependency for binfmt_flat. sh: Disable psw support for R7785RP. sh: Fix page size alignment in __copy_user_page(). sh: Fix up various compile warnings for SE boards. sh: Wire up signalfd/timerfd/eventfd syscalls. sh: revert addition of page fault notifiers spelling fixes: arch/sh/ input: hp680_ts compile fixes. sh: landisk: Header cleanups. sh: landisk: rtc-rs5c313 support. sh: Kill off pmb slab cache destructor. sh: Fix up psw build rules for r7780rp. sh: Shut up compiler warnings in __do_page_fault().	2007-05-22 17:26:18 -07:00
Jeff Garzik	72dd9ca599	partitions/LDM: build fix This from a "tested" patch... Signed-off-by: Jeff Garzik <jeff@garzik.org> Cc: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-21 21:38:17 -07:00
Anton Altaparmakov	dde33348e5	LDM: Fix for Windows Vista dynamic disks This fixes the LDM driver so that it works with Windows Vista dynamic disks which are subtly different to Windows 2000/XP ones. The patch was needed to get a Vista formatted dynamic disk to be recognized and parsed successfully. Thanks go to Chris Teachworth for the report and testing. Cc: Richard Russon <ldm@flatcap.org> Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-21 09:58:40 -07:00
Alexey Dobriyan	e8edc6e03a	Detach sched.h from mm.h First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-21 09:18:19 -07:00
OGAWA Hirofumi	ff1be9ad61	Fix "fs: convert core functions to zero_user_page" The bug was introduced by `01f2705daf`. It misses to convert the first argument, it should be "new_page". This became a cause of fatfs corruption. Cc: Nate Diller <nate.diller@gmail.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-21 09:15:32 -07:00
Paul Mundt	1d4be747a8	fs: Kill sh dependency for binfmt_flat. Not really sure where this bogosity came from, but there's certainly nothing special about sh that lets us use flat files with the MMU on. Kill the dependency, and leave it as !MMU, like it is for all of the other nommu-wielding ports. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-21 14:34:00 +09:00
David Woodhouse	2ad8ee7135	[JFFS2] Fix potential memory leak of dead xattrs on unmount. An xattr_datum which ends up orphaned should be freed by the GC thread. But if we umount before the GC thread is finished, or if we mount read-only and the GC thread never runs, they might never be freed. Clean them up during unmount, if there are any left. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-05-20 11:30:38 -04:00
David Woodhouse	8ae5d31263	[JFFS2] Fix BUG() caused by failing to discard xattrs on deleted files. When we cannot mark nodes as obsolete, such as on NAND flash, we end up having to delete inodes with !nlink in jffs2_build_remove_unlinked_inode(). However, jffs2_build_xattr_subsystem() runs later than this, and will attach an xref to the dead inode. Then later when the last nodes of that dead inode are erased we hit a BUG() in jffs2_del_ino_cache() because we're not supposed to get there with an xattr still attached to the inode which is being killed. The simple fix is to refrain from attaching xattrs to inodes with zero nlink, in jffs2_build_xattr_subsystem(). It's it's OK to trust nlink here because the file system isn't actually mounted yet, so there's no chance that a zero-nlink file could actually be alive still because it's open. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-05-20 11:28:22 -04:00
Davide Libenzi	18963c01b8	timerfd use waitqueue lock ... The timerfd was using the unlocked waitqueue operations, but it was using a different lock, so poll_wait() would race with it. This makes timerfd directly use the waitqueue lock. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-18 13:09:34 -07:00
Davide Libenzi	d48eb23315	eventfd use waitqueue lock ... The eventfd was using the unlocked waitqueue operations, but it was using a different lock, so poll_wait() would race with it. This makes eventfd directly use the waitqueue lock. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-18 13:09:34 -07:00
Trond Myklebust	dd504ea16f	Merge branch 'master' of /home/trondmy/repositories/git/linux-2.6/	2007-05-17 11:36:59 -04:00
Christoph Lameter	ea125892a1	Fix page allocation flags in grow_dev_page() grow_dev_page() simply passes GFP_NOFS to find_or_create_page. This means the allocation of radix tree nodes is done with GFP_NOFS and the allocation of a new page is done using GFP_NOFS. The mapping has a flags field that contains the necessary allocation flags for the page cache allocation. These need to be consulted in order to get DMA and HIGHMEM allocations etc right. And yes a blockdev could be allowing Highmem allocations if its a ramdisk. Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-17 05:23:06 -07:00
Jan Kara	7925409e20	circular locking dependency found in QUOTA OFF i_mutex on quota files is special. Unlike i_mutexes for other inodes it is acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota file can be acquired under dqonoff_mutex). Move truncate_inode_pages() call under dqonoff_mutex and save some problems with races... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-17 05:23:05 -07:00
Nate Diller	c9f2875b79	ecryptfs: use zero_user_page Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-17 05:23:05 -07:00
Dan Aloni	71ce92f3fa	make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core filename size Make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core filename size and change it to 128, so that extensive patterns such as '/local/cores/%e-%h-%s-%t-%p.core' won't result in truncated filename generation. Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-17 05:23:05 -07:00
Trond Myklebust	5cf4cf65a8	Merge branch 'master' of /home/trondmy/repositories/git/linux-2.6/	2007-05-17 08:23:04 -04:00
Heiko Carstens	8317f14b60	simplify compat_sys_timerfd Just thought this is easier to read. Acked-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-17 05:23:04 -07:00
Christoph Lameter	a35afb830f	Remove SLAB_CTOR_CONSTRUCTOR SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-17 05:23:04 -07:00
David Howells	bb33ed6345	AFS: Fix afs_prepare_write() afs_prepare_write() should not mark a page up to date if it only partially fills it in, in expectation of the caller filling in the rest prior to calling commit_write(). commit_write(), however, should mark the page up to date. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-16 21:19:15 -07:00
David Howells	faab83bbcd	AFS: write back dirty data on unmount Fix AFS to write back dirty on unmounting. This didn't happen because afs_super_ops.drop_inode was pointing to generic_delete_inode. Now this pointer is left set to NULL so that the default behaviour occurs instead. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-16 21:19:15 -07:00
Trond Myklebust	6684e323a2	Merge branch 'origin'	2007-05-15 16:11:17 -04:00
Davide Libenzi	f0ee9aabb0	epoll: move kfree inside ep_free Move the kfree() call inside the ep_free() function. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-15 08:54:00 -07:00
Davide Libenzi	67647d0fb8	epoll: fix some comments Fixes some epoll code comments. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-15 08:54:00 -07:00
Davide Libenzi	c7ea763025	epoll locks changes and cleanups Changes the rwlock to a spinlock, and drops the use-count variable. Operations are always bound by the mutex now, so the use-count is no more needed. For the same reason, the rwlock can become a simple spinlock. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-15 08:53:59 -07:00
Davide Libenzi	d47de16c72	fix epoll single pass code and add wait-exclusive flag Fixes the epoll single pass code. During the unlocked event delivery (to userspace) code, the poll callback can re-issue new events, and we must receive them correctly. Since we loop in a lockless fashion, we want to be O(nready), and we don't want to flash on/off the spinlock for every event, we have the poll callback to use a secondary list to queue events while we're inside the event delivery loop. The rw_semaphore has been turned into a mutex. This patch also adds the wait-exclusive flag, as suggested by Davi Arnaut. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-15 08:53:59 -07:00
Trond Myklebust	d48c5f4100	NLM: Fix sparse warnings - fs/lockd/xdr4.c:140:27: warning: incorrect type in argument 2 (different explicit signedness) - fs/lockd/xdr4.c:141:27: warning: incorrect type in argument 2 (different explicit signedness) - fs/lockd/xdr4.c:432:28: warning: incorrect type in argument 2 (different explicit signedness) - fs/lockd/xdr4.c:433:28: warning: incorrect type in argument 2 (different explicit signedness) - fs/lockd/xdr4.c:587:20: warning: symbol 'nlm_version4' was not declared. Should it be static? Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-14 19:33:46 -04:00
Trond Myklebust	2e42c3e2ae	NFS: Fix more sparse warnings - fs/nfs/nfs4xdr.c:2499:42: warning: incorrect type in argument 2 (different signedness) - fs/nfs/nfs4xdr.c:2658:49: warning: incorrect type in argument 4 (different explicit signedness) - fs/nfs/nfs4xdr.c:2683:50: warning: incorrect type in argument 4 (different explicit signedness) - fs/nfs/nfs4xdr.c:3063:68: warning: incorrect type in argument 4 (different explicit signedness) - fs/nfs/nfs4xdr.c:3065:68: warning: incorrect type in argument 4 (different explicit signedness) - fs/nfs/callback_xdr.c:138:31: warning: incorrect type in argument 2 (different signedness) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-14 19:33:46 -04:00
Trond Myklebust	10afec9081	NFS: Fix some 'sparse' warnings... - fs/nfs/dir.c:610:8: warning: symbol 'nfs_llseek_dir' was not declared. Should it be static? - fs/nfs/dir.c:636:5: warning: symbol 'nfs_fsync_dir' was not declared. Should it be static? - fs/nfs/write.c:925:19: warning: symbol 'req' shadows an earlier one - fs/nfs/write.c:61:6: warning: symbol 'nfs_commit_rcu_free' was not declared. Should it be static? - fs/nfs/nfs4proc.c:793:5: warning: symbol 'nfs4_recover_expired_lease' was not declared. Should it be static? Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-14 19:33:46 -04:00
Trond Myklebust	8ae20abdd1	NFS4: Fix incorrect use of sizeof() in fs/nfs/nfs4xdr.c The XDR code should not depend on the physical allocation size of structures like nfs4_stateid and nfs4_verifier since those may have to change at some future date. We therefore replace all uses of sizeof() with constants like NFS4_VERIFIER_SIZE and NFS4_STATEID_SIZE. This also has the side-effect of fixing some warnings of the type format ‘%u’ expects type ‘unsigned int’, but argument X has type ‘long unsigned int’ on 64-bit systems Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-14 19:33:45 -04:00
Nate Diller	60945cb7c8	NFS: use zero_user_page Use zero_user_page() instead of the newly deprecated memclear_highpage_flush(). Signed-off-by: Nate Diller <nate.diller@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-14 19:33:45 -04:00
Oleg Nesterov	550facd138	NLM: don't use CLONE_SIGHAND in nlmclnt_recovery reclaimer() calls allow_signal() which plays with parent process's ->sighand. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-14 19:33:44 -04:00
Trond Myklebust	21051ba625	NLM: Fix locking client timeouts... nlmsvc_timeout is already in units of HZ... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-14 19:33:44 -04:00
Nate Diller	e3bf460f3e	ntfs: use zero_user_page Use zero_user_page() instead of open-coding it. [akpm@linux-foundation.org: kmap-type fixes] Signed-off-by: Nate Diller <nate.diller@gmail.com> Acked-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-12 10:55:39 -07:00
Linus Torvalds	853da00220	Merge branch 'audit.b38' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b38' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] Abnormal End of Processes [PATCH] match audit name data [PATCH] complete message queue auditing [PATCH] audit inode for all xattr syscalls [PATCH] initialize name osid [PATCH] audit signal recipients [PATCH] add SIGNAL syscall class (v3) [PATCH] auditing ptrace	2007-05-11 09:57:16 -07:00
Davide Libenzi	7699acd134	epoll cleanups: epoll remove static pre-declarations and akpm-ize the code Re-arrange epoll code to avoid static functions pre-declarations, and apply akpm-filter on it. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:37 -07:00
Davide Libenzi	cea6924187	epoll cleanups: epoll no module Epoll is either compiled it, or not (if EMBEDDED). Remove the module code and use fs_initcall(). Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:37 -07:00
Davide Libenzi	da66f7cb0f	epoll: use anonymous inodes Cut out lots of code from epoll, by reusing the anonymous inode source patch (fs/anon_inodes.c). Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:37 -07:00
Davide Libenzi	9c3060bedd	signal/timer/event: KAIO eventfd support example This is an example about how to add eventfd support to the current KAIO code, in order to enable KAIO to post readiness events to a pollable fd (hence compatible with POSIX select/poll). The KAIO code simply signals the eventfd fd when events are ready, and this triggers a POLLIN in the fd. This patch uses a reserved for future use member of the struct iocb to pass an eventfd file descriptor, that KAIO will use to post events every time a request completes. At that point, an aio_getevents() will return the completed result to a struct io_event. I made a quick test program to verify the patch, and it runs fine here: http://www.xmailserver.org/eventfd-aio-test.c The test program uses poll(2), but it'd, of course, work with select and epoll too. This can allow to schedule both block I/O and other poll-able devices requests, and wait for results using select/poll/epoll. In a typical scenario, an application would submit KAIO request using aio_submit(), and will also use epoll_ctl() on the whole other class of devices (that with the addition of signals, timers and user events, now it's pretty much complete), and then would: epoll_wait(...); for_each_event { if (curr_event_is_kaiofd) { aio_getevents(); dispatch_aio_events(); } else { dispatch_epoll_event(); } } Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:37 -07:00
Davide Libenzi	e1ad7468c7	signal/timer/event: eventfd core This is a very simple and light file descriptor, that can be used as event wait/dispatch by userspace (both wait and dispatch) and by the kernel (dispatch only). It can be used instead of pipe(2) in all cases where those would simply be used to signal events. Their kernel overhead is much lower than pipes, and they do not consume two fds. When used in the kernel, it can offer an fd-bridge to enable, for example, functionalities like KAIO or syslets/threadlets to signal to an fd the completion of certain operations. But more in general, an eventfd can be used by the kernel to signal readiness, in a POSIX poll/select way, of interfaces that would otherwise be incompatible with it. The API is: int eventfd(unsigned int count); The eventfd API accepts an initial "count" parameter, and returns an eventfd fd. It supports poll(2) (POLLIN, POLLOUT, POLLERR), read(2) and write(2). The POLLIN flag is raised when the internal counter is greater than zero. The POLLOUT flag is raised when at least a value of "1" can be written to the internal counter. The POLLERR flag is raised when an overflow in the counter value is detected. The write(2) operation can never overflow the counter, since it blocks (unless O_NONBLOCK is set, in which case -EAGAIN is returned). But the eventfd_signal() function can do it, since it's supposed to not sleep during its operation. The read(2) function reads the __u64 counter value, and reset the internal value to zero. If the value read is equal to (__u64) -1, an overflow happened on the internal counter (due to 2^64 eventfd_signal() posts that has never been retired - unlickely, but possible). The write(2) call writes an __u64 count value, and adds it to the current counter. The eventfd fd supports O_NONBLOCK also. On the kernel side, we have: struct file eventfd_fget(int fd); int eventfd_signal(struct file file, unsigned int n); The eventfd_fget() should be called to get a struct file* from an eventfd fd (this is an fget() + check of f_op being an eventfd fops pointer). The kernel can then call eventfd_signal() every time it wants to post an event to userspace. The eventfd_signal() function can be called from any context. An eventfd() simple test and bench is available here: http://www.xmailserver.org/eventfd-bench.c This is the eventfd-based version of pipetest-4 (pipe(2) based): http://www.xmailserver.org/pipetest-4.c Not that performance matters much in the eventfd case, but eventfd-bench shows almost as double as performance than pipetest-4. [akpm@linux-foundation.org: fix i386 build] [akpm@linux-foundation.org: add sys_eventfd to sys_ni.c] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:36 -07:00
Davide Libenzi	83f5d12669	signal/timer/event: timerfd compat code This patch implements the necessary compat code for the timerfd system call. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:36 -07:00
Davide Libenzi	b215e28399	signal/timer/event: timerfd core This patch introduces a new system call for timers events delivered though file descriptors. This allows timer event to be used with standard POSIX poll(2), select(2) and read(2). As a consequence of supporting the Linux f_op->poll subsystem, they can be used with epoll(2) too. The system call is defined as: int timerfd(int ufd, int clockid, int flags, const struct itimerspec *utmr); The "ufd" parameter allows for re-use (re-programming) of an existing timerfd w/out going through the close/open cycle (same as signalfd). If "ufd" is -1, s new file descriptor will be created, otherwise the existing "ufd" will be re-programmed. The "clockid" parameter is either CLOCK_MONOTONIC or CLOCK_REALTIME. The time specified in the "utmr->it_value" parameter is the expiry time for the timer. If the TFD_TIMER_ABSTIME flag is set in "flags", this is an absolute time, otherwise it's a relative time. If the time specified in the "utmr->it_interval" is not zero (.tv_sec == 0, tv_nsec == 0), this is the period at which the following ticks should be generated. The "utmr->it_interval" should be set to zero if only one tick is requested. Setting the "utmr->it_value" to zero will disable the timer, or will create a timerfd without the timer enabled. The function returns the new (or same, in case "ufd" is a valid timerfd descriptor) file, or -1 in case of error. As stated before, the timerfd file descriptor supports poll(2), select(2) and epoll(2). When a timer event happened on the timerfd, a POLLIN mask will be returned. The read(2) call can be used, and it will return a u32 variable holding the number of "ticks" that happened on the interface since the last call to read(2). The read(2) call supportes the O_NONBLOCK flag too, and EAGAIN will be returned if no ticks happened. A quick test program, shows timerfd working correctly on my amd64 box: http://www.xmailserver.org/timerfd-test.c [akpm@linux-foundation.org: add sys_timerfd to sys_ni.c] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:36 -07:00
Davide Libenzi	6d18c92209	signal/timer/event: signalfd compat code This patch implements the necessary compat code for the signalfd system call. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:36 -07:00
Davide Libenzi	fba2afaaec	signal/timer/event: signalfd core This patch series implements the new signalfd() system call. I took part of the original Linus code (and you know how badly it can be broken :), and I added even more breakage ;) Signals are fetched from the same signal queue used by the process, so signalfd will compete with standard kernel delivery in dequeue_signal(). If you want to reliably fetch signals on the signalfd file, you need to block them with sigprocmask(SIG_BLOCK). This seems to be working fine on my Dual Opteron machine. I made a quick test program for it: http://www.xmailserver.org/signafd-test.c The signalfd() system call implements signal delivery into a file descriptor receiver. The signalfd file descriptor if created with the following API: int signalfd(int ufd, const sigset_t mask, size_t masksize); The "ufd" parameter allows to change an existing signalfd sigmask, w/out going to close/create cycle (Linus idea). Use "ufd" == -1 if you want a brand new signalfd file. The "mask" allows to specify the signal mask of signals that we are interested in. The "masksize" parameter is the size of "mask". The signalfd fd supports the poll(2) and read(2) system calls. The poll(2) will return POLLIN when signals are available to be dequeued. As a direct consequence of supporting the Linux poll subsystem, the signalfd fd can use used together with epoll(2) too. The read(2) system call will return a "struct signalfd_siginfo" structure in the userspace supplied buffer. The return value is the number of bytes copied in the supplied buffer, or -1 in case of error. The read(2) call can also return 0, in case the sighand structure to which the signalfd was attached, has been orphaned. The O_NONBLOCK flag is also supported, and read(2) will return -EAGAIN in case no signal is available. If the size of the buffer passed to read(2) is lower than sizeof(struct signalfd_siginfo), -EINVAL is returned. A read from the signalfd can also return -ERESTARTSYS in case a signal hits the process. The format of the struct signalfd_siginfo is, and the valid fields depends of the (->code & __SI_MASK) value, in the same way a struct siginfo would: struct signalfd_siginfo { __u32 signo; / si_signo / __s32 err; / si_errno / __s32 code; / si_code / __u32 pid; / si_pid / __u32 uid; / si_uid / __s32 fd; / si_fd / __u32 tid; / si_fd / __u32 band; / si_band / __u32 overrun; / si_overrun / __u32 trapno; / si_trapno / __s32 status; / si_status / __s32 svint; / si_int / __u64 svptr; / si_ptr / __u64 utime; / si_utime / __u64 stime; / si_stime / __u64 addr; / si_addr */ }; [akpm@linux-foundation.org: fix signalfd_copyinfo() on i386] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:36 -07:00
Davide Libenzi	5dc8bf8132	signal/timer/event fds: anonymous inode source This patch add an anonymous inode source, to be used for files that need and inode only in order to create a file. We do not care of having an inode for each file, and we do not even care of having different names in the associated dentries (dentry names will be same for classes of file). This allow code reuse, and will be used by epoll, signalfd and timerfd (and whatever else there'll be). Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:36 -07:00
Sukadev Bhattiprolu	fa0334f19f	Replace pid_t in autofs with struct pid reference Make autofs container-friendly by caching struct pid reference rather than pid_t and using pid_nr() to retreive a task's pid_t. ChangeLog: - Fix Eric Biederman's comments - Use find_get_pid() to hold a reference to oz_pgrp and release while unmounting; separate out changes to autofs and autofs4. - Fix Cedric's comments: retain old prototype of parse_options() and move necessary change to its caller. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: containers@lists.osdl.org Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:36 -07:00
Sukadev Bhattiprolu	d78e53c89a	Fix some coding-style errors in autofs Fix coding style errors (extra spaces, long lines) in autofs and autofs4 files being modified for container/pidspace issues. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <containers@lists.osdl.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:36 -07:00
Sukadev Bhattiprolu	e713d0dab2	attach_pid() with struct pid parameter attach_pid() currently takes a pid_t and then uses find_pid() to find the corresponding struct pid. Sometimes we already have the struct pid. We can then skip find_pid() if attach_pid() were to take a struct pid parameter. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <containers@lists.osdl.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:35 -07:00
Miklos Szeredi	0ea9718016	consolidate generic_writepages and mpage_writepages Clean up massive code duplication between mpage_writepages() and generic_writepages(). The new generic function, write_cache_pages() takes a function pointer argument, which will be called for each page to be written. Maybe cifs_writepages() too can use this infrastructure, but I'm not touching that with a ten-foot pole. The upcoming page writeback support in fuse will also want this. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:35 -07:00
Olaf Hering	4c64c30a5c	small cleanup in gpt partition handling Remove unused argument in is_pmbr_valid() Remove unneeded initialization of local variable legacy_mbr Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:34 -07:00
Geert Uytterhoeven	22258d406f	Let SYSV68_PARTITION default to yes on VME only Don't enable SYSV68 partition table support on all m68k boxes by default, only on Motorola VME boards. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:33 -07:00
David Howells	45222b9e02	AFS: implement statfs Implement the statfs() op for AFS. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:32 -07:00
David Howells	0f300ca928	AFS: fix a couple of problems with unlinking AFS files Fix a couple of problems with unlinking AFS files. (1) The parent directory wasn't being updated properly between unlink() and the following lookup(). It seems that, for some reason, invalidate_remote_inode() wasn't discarding the directory contents correctly, so this patch calls invalidate_inode_pages2() instead on non-regular files. (2) afs_vnode_deleted_remotely() should handle vnodes that don't have a source server recorded without oopsing. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:32 -07:00
David Howells	9d577b6a31	AFS: fix interminable loop in afs_write_back_from_locked_page() Following bug was uncovered by compiling with '-W' flag: CC [M] fs/afs/write.o fs/afs/write.c: In function âafs_write_back_from_locked_pageâ: fs/afs/write.c:398: warning: comparison of unsigned expression >= 0 is always true Loop variable 'n' is unsigned, so wraps around happily as far as I can see. Trival fix attached (compile tested only). Signed-off-by: Mika Kukkonen <mikukkon@iki.fi> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-11 08:29:32 -07:00
David Howells	acaebfd8a7	[MTD] generalise the handling of MTD-specific superblocks Generalise the handling of MTD-specific superblocks so that JFFS2 and ROMFS can both share it. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-05-11 12:14:15 +01:00
Steve Grubb	0a4ff8c259	[PATCH] Abnormal End of Processes Hi, I have been working on some code that detects abnormal events based on audit system events. One kind of event that we currently have no visibility for is when a program terminates due to segfault - which should never happen on a production machine. And if it did, you'd want to investigate it. Attached is a patch that collects these events and sends them into the audit system. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-05-11 05:38:26 -04:00
Amy Griffis	4fc03b9beb	[PATCH] complete message queue auditing Handle the edge cases for POSIX message queue auditing. Collect inode info when opening an existing mq, and for send/receive operations. Remove audit_inode_update() as it has really evolved into the equivalent of audit_inode(). Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-05-11 05:38:26 -04:00
Amy Griffis	510f4006e7	[PATCH] audit inode for all xattr syscalls Collect inode info for the remaining xattr syscalls that operate on a file descriptor. These don't call a path_lookup variant, so they aren't covered by the general audit hook. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-05-11 05:38:26 -04:00
J. Bruce Fields	129a84de23	locks: fix F_GETLK regression (failure to find conflicts) In `9d6a8c5c21` we changed posix_test_lock to modify its single file_lock argument instead of taking separate input and output arguments. This makes it no longer safe to set the output lock's fl_type to F_UNLCK before looking for a conflict, since that means searching for a conflict against a lock with type F_UNLCK. This fixes a regression which causes F_GETLK to incorrectly report no conflict on most filesystems (including any filesystem that doesn't do its own locking). Also fix posix_lock_to_flock() to copy the lock type. This isn't strictly necessary, since the caller already does this; but it seems less likely to cause confusion in the future. Thanks to Doug Chapman for the bug report. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Acked-by: Doug Chapman <doug.chapman@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-10 20:25:59 -07:00
Simon Horman	d9de2622bd	Allow compat_ioctl.c to compile without CONFIG_NET A small regression appears to have been introduced in the recent patch "cleanup compat ioctl handling", which was included in Linus' tree after 2.6.20. siocdevprivate_ioctl() is no longer defined if CONFIG_NET is undefined, whereas previously it was a dummy function in this case. This causes compilation with CONFIG_COMPAT but without CONFIG_NET to fail. fs/compat_ioctl.c: In function `compat_sys_ioctl': fs/compat_ioctl.c:3571: warning: implicit declaration of function `siocdevprivate_ioctl' Cc: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-10 13:34:05 -07:00
Randy Dunlap	c4a7f5eb5f	ocfs2: kobject/kset foobar Fix gcc warning and Oops that it causes: fs/ocfs2/cluster/masklog.c:161: warning: assignment from incompatible pointer type [ 2776.204120] OCFS2 Node Manager 1.3.3 [ 2776.211729] BUG: spinlock bad magic on CPU#0, modprobe/4424 [ 2776.214269] lock: ffff810021c8fe18, .magic: ffffffff, .owner: /6394416, .owner_cpu: 0 [ 2776.217864] [ 2776.217865] Call Trace: [ 2776.219662] [<ffffffff803426c8>] spin_bug+0x9e/0xe9 [ 2776.221921] [<ffffffff803427bf>] _raw_spin_lock+0x23/0xf9 [ 2776.224417] [<ffffffff8051acf4>] _spin_lock+0x9/0xb [ 2776.226676] [<ffffffff8033c3b1>] kobject_shadow_add+0x98/0x1ac [ 2776.229367] [<ffffffff8033c4d0>] kobject_add+0xb/0xd [ 2776.231665] [<ffffffff8033c4df>] kset_add+0xd/0xf [ 2776.233845] [<ffffffff8033c5a6>] kset_register+0x23/0x28 [ 2776.236309] [<ffffffff8808ccb7>] :ocfs2_nodemanager:mlog_sys_init+0x68/0x6d [ 2776.239518] [<ffffffff8808ccee>] :ocfs2_nodemanager:o2cb_sys_init+0x32/0x4a [ 2776.242726] [<ffffffff880b80a6>] :ocfs2_nodemanager:init_o2nm+0xa6/0xd5 [ 2776.245772] [<ffffffff8025266c>] sys_init_module+0x1471/0x15d2 [ 2776.248465] [<ffffffff8033f250>] simple_strtoull+0x0/0xdc [ 2776.250959] [<ffffffff8020948e>] system_call+0x7e/0x83 Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-10 09:26:52 -07:00
David Howells	5bbf5d39f8	AFS: further write support fixes Further fixes for AFS write support: (1) The afs_send_pages() outer loop must do an extra iteration if it ends with 'first == last' because 'last' is inclusive in the page set otherwise it fails to send the last page and complete the RxRPC op under some circumstances. (2) Similarly, the outer loop in afs_pages_written_back() must also do an extra iteration if it ends with 'first == last', otherwise it fails to clear PG_writeback on the last page under some circumstances. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-10 09:26:52 -07:00
David Howells	b9b1f8d593	AFS: write support fixes AFS write support fixes: (1) Support large files using the 64-bit file access operations if available on the server. (2) Use kmap_atomic() rather than kmap() in afs_prepare_page(). (3) Don't do stuff in afs_writepage() that's done by the caller. [akpm@linux-foundation.org: fix right shift count >= width of type] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-10 09:26:52 -07:00
Jesper Juhl	7a13e93228	NFS: Kill the obsolete NFS_PARANOIA Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-09 17:58:01 -04:00
Milind Arun Choudhary	fee7f23fea	NFS: use __set_current_state() use __set_current_state(TASK_) instead of current->state = TASK_, in fs/nfs Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-09 17:58:01 -04:00
Chuck Lever	e4cc6ee2e4	NFS: Clean up NFSv4 XDR error message Make it more useful for debugging purposes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-09 17:58:00 -04:00
Chuck Lever	6ce7dc9407	NFS: NFS client underestimates how large an NFSv4 SETATTR reply can be The maximum size of an NFSv4 SETATTR compound reply should include the GETATTR operation that we send. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-09 17:58:00 -04:00
Trond Myklebust	e70c490810	NFS: Remove redundant check in nfs_check_verifier() The check for nfs_attribute_timeout(dir) in nfs_check_verifier is redundant: nfs_lookup_revalidate() will already call nfs_revalidate_inode() on the parent dir when necessary. The only case where this is not done is the case of a negative dentry. Fix this case by moving up the revalidation code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-09 17:57:59 -04:00
Trond Myklebust	e62c2bba1f	NFS: Fix a jiffie wraparound issue dentry verifiers are always set to the parent directory's cache_change_attribute. There is no reason to be testing for anything other than equality when we're trying to find out if the dentry has been checked since the last time the directory was modified. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-05-09 17:57:58 -04:00
Linus Torvalds	ba7cc09c9c	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (21 commits) [MTD] [CHIPS] Remove MTD_OBSOLETE_CHIPS (jedec, amd_flash, sharp) [MTD] Delete allegedly obsolete "bank_size" field of mtd_info. [MTD] Remove unnecessary user space check from mtd.h. [MTD] [MAPS] Remove flash maps for no longer supported 405LP boards [MTD] [MAPS] Fix missing printk() parameter in physmap_of.c MTD driver [MTD] [NAND] platform NAND driver: add driver [MTD] [NAND] platform NAND driver: update header [JFFS2] Simplify and clean up jffs2_add_tn_to_tree() some more. [JFFS2] Remove another bogus optimisation in jffs2_add_tn_to_tree() [JFFS2] Remove broken insert_point optimisation in jffs2_add_tn_to_tree() [JFFS2] Remember to calculate overlap on nodes which replace older nodes [JFFS2] Don't advance c->wbuf_ofs to next eraseblock after wbuf flush [MTD] [NAND] at91_nand.c: CMDLINE_PARTS support [MTD] [NAND] Tidy up handling of page number in nand_block_bad() [MTD] block2mtd_paramline[] mustn't be __initdata [MTD] [NAND] Support multiple chips in CAFÉ driver [MTD] [NAND] Rename cafe.c to cafe_nand.c and remove the multi-obj magic [MTD] [NAND] Use rslib for CAFÉ ECC [RSLIB] Support non-canonical GF representations [JFFS2] Remove dead file histo_mips.h ...	2007-05-09 13:10:11 -07:00
Linus Torvalds	9a9136e270	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits) sound: convert "sound" subdirectory to UTF-8 MAINTAINERS: Add cxacru website/mailing list include files: convert "include" subdirectory to UTF-8 general: convert "kernel" subdirectory to UTF-8 documentation: convert the Documentation directory to UTF-8 Convert the toplevel files CREDITS and MAINTAINERS to UTF-8. remove broken URLs from net drivers' output Magic number prefix consistency change to Documentation/magic-number.txt trivial: s/i_sem /i_mutex/ fix file specification in comments drivers/base/platform.c: fix small typo in doc misc doc and kconfig typos Remove obsolete fat_cvf help text Fix occurrences of "the the " Fix minor typoes in kernel/module.c Kconfig: Remove reference to external mqueue library Kconfig: A couple of grammatical fixes in arch/i386/Kconfig Correct comments in genrtc.c to refer to correct /proc file. Fix more "deprecated" spellos. Fix "deprecated" typoes. ... Fix trivial comment conflict in kernel/relay.c.	2007-05-09 12:54:17 -07:00
Rafael J. Wysocki	8bb7844286	Add suspend-related notifications for CPU hotplug Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:56 -07:00
Nate Diller	f2fff59695	reiserfs: use zero_user_page Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:56 -07:00
Nate Diller	0c11d7a9e9	ext3: use zero_user_page Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:55 -07:00
Nate Diller	f36dca90e6	affs: use zero_user_page Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:55 -07:00
Nate Diller	01f2705daf	fs: convert core functions to zero_user_page It's very common for file systems to need to zero part or all of a page, the simplist way is just to use kmap_atomic() and memset(). There's actually a library function in include/linux/highmem.h that does exactly that, but it's confusingly named memclear_highpage_flush(), which is descriptive of how it does the work rather than what the purpose is. So this patchset renames the function to zero_user_page(), and calls it from the various places that currently open code it. This first patch introduces the new function call, and converts all the core kernel callsites, both the open-coded ones and the old memclear_highpage_flush() ones. Following this patch is a series of conversions for each file system individually, per AKPM, and finally a patch deprecating the old call. The diffstat below shows the entire patchset. [akpm@linux-foundation.org: fix a few things] Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:55 -07:00
NeilBrown	b41eeef14d	knfsd: avoid Oops if buggy userspace performs confusing filehandle->dentry mapping When a lookup request arrives, nfsd uses information provided by userspace (mountd) to find the right filesystem. It then assumes that the same filehandle type as the incoming filehandle can be used to create an outgoing filehandle. However if mountd is buggy, or maybe just being creative, the filesystem may not support that filesystem type, and the kernel could oops, particularly if 'ex_uuid' is NULL but a FSID_UUID* filehandle type is used. So add some proper checking that the fsid version/type from the incoming filehandle is actually supportable, and ignore that information if it isn't supportable. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:54 -07:00
NeilBrown	072f62ed85	knfsd: various nfsd xdr cleanups 1/ decode_sattr and decode_sattr3 never return NULL, so remove several checks for that. ditto for xdr_decode_hyper. 2/ replace some open coded XDR_QUADLEN calls with calls to XDR_QUADLEN 3/ in decode_writeargs, simply an 'if' to use a single calculation. .page_len is the length of that part of the packet that did not fit in the first page (the head). So the length of the data part is the remainder of the head, plus page_len. 3/ other minor cleanups. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:54 -07:00
Christoph Hellwig	f725b217b1	knfsd: trivial makefile cleanup kbuild directly interprets <modulename>-y as objects to build into a module, no need to assign it to the old foo-objs variable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:54 -07:00
NeilBrown	402acd29e5	knfsd: avoid use of unitialised variables on error path when nfs exports We need to zero various parts of 'exp' before any 'goto out', otherwise when we go to free the contents... we die. Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:54 -07:00
Jeff Layton	cd123012d9	RPC: add wrapper for svc_reserve to account for checksum When the kernel calls svc_reserve to downsize the expected size of an RPC reply, it fails to account for the possibility of a checksum at the end of the packet. If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O then you'll generally see messages similar to this in the server's ring buffer: RPC request reserved 164 but used 208 While I was never able to verify it, I suspect that this problem is also the root cause of some oopses I've seen under these conditions: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726 This is probably also a problem for other sec= types and for NFSv4. The large reserved size for NFSv4 compound packets seems to generally paper over the problem, however. This patch adds a wrapper for svc_reserve that accounts for the possibility of a checksum. It also fixes up the appropriate callers of svc_reserve to call the wrapper. For now, it just uses a hardcoded value that I determined via testing. That value may need to be revised upward as things change, or we may want to eventually add a new auth_op that attempts to calculate this somehow. Unfortunately, there doesn't seem to be a good way to reliably determine the expected checksum length prior to actually calculating it, particularly with schemes like spkm3. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:54 -07:00
Eric W. Biederman	6697164335	nfsd/nfs4state: remove unnecessary daemonize call Acked-by: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:54 -07:00
Peter Staubach	f34b95689d	The NFSv2/NFSv3 server does not handle zero length WRITE requests correctly The NFSv2 and NFSv3 servers do not handle WRITE requests for 0 bytes correctly. The specifications indicate that the server should accept the request, but it should mostly turn into a no-op. Currently, the server will return an XDR decode error, which it should not. Attached is a patch which addresses this issue. It also adds some boundary checking to ensure that the request contains as much data as was requested to be written. It also correctly handles an NFSv3 request which requests to write more data than the server has stated that it is prepared to handle. Previously, there was some support which looked like it should work, but wasn't quite right. Signed-off-by: Peter Staubach <staubach@redhat.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:54 -07:00
Adrian Bunk	8842c9655b	remove nfs4_acl_add_ace() nfs4_acl_add_ace() can now be removed. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Neil Brown <neilb@cse.unsw.edu.au> Acked-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:54 -07:00
Oleg Nesterov	28e53bddf8	unify flush_work/flush_work_keventd and rename it to cancel_work_sync flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq (this was possible from the very beginnig, I missed this). So we can unify flush_work_keventd and flush_work. Also, rename flush_work() to cancel_work_sync() and fix all callers. Perhaps this is not the best name, but "flush_work" is really bad. (akpm: this is why the earlier patches bypassed maintainers) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Auke Kok <auke-jan.h.kok@intel.com>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:53 -07:00
Andrew Morton	a9df62c758	aio: use flush_work() Migrate AIO over to use flush_work(). Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:51 -07:00
David Howells	31143d5d51	AFS: implement basic file write support Implement support for writing to regular AFS files, including: (1) write (2) truncate (3) fsync, fdatasync (4) chmod, chown, chgrp, utime. AFS writeback attempts to batch writes into as chunks as large as it can manage up to the point that it writes back 65535 pages in one chunk or it meets a locked page. Furthermore, if a page has been written to using a particular key, then should another write to that page use some other key, the first write will be flushed before the second is allowed to take place. If the first write fails due to a security error, then the page will be scrapped and reread before the second write takes place. If a page is dirty and the callback on it is broken by the server, then the dirty data is not discarded (same behaviour as NFS). Shared-writable mappings are not supported by this patch. [akpm@linux-foundation.org: fix a bunch of warnings] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:50 -07:00
David Howells	416351f28d	AFS: AFS fixups Make some miscellaneous changes to the AFS filesystem: (1) Assert RCU barriers on module exit to make sure RCU has finished with callbacks in this module. (2) Correctly handle the AFS server returning a zero-length read. (3) Split out data zapping calls into one function (afs_zap_data). (4) Rename some afs_file_() functions to afs_() where they apply to non-regular files too. (5) Be consistent about the presentation of volume ID:vnode ID in debugging output. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:50 -07:00
Josef 'Jeff' Sipek	2dfdd266b9	fs: use path_walk in do_path_lookup Since path_walk sets the total_link_count to 0 and calls link_path_walk, we can just call path_walk directly. Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:50 -07:00
Josef 'Jeff' Sipek	62ce39c531	fs: fix indentation in do_path_lookup Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:49 -07:00
Akinobu Mita	92f4c701aa	use simple_read_from_buffer() in fs/ Cleanup using simple_read_from_buffer() in binfmt_misc, configfs, and sysfs. Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Joel Becker <joel.becker@oracle.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-09 12:30:49 -07:00
Uwe Kleine-König	5886269962	fix file specification in comments Many files include the filename at the beginning, serveral used a wrong one. Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 08:58:16 +02:00
Alexander E. Patrakov	148e423f90	Remove obsolete fat_cvf help text The text removed by the following patch refers to functionality that never worked, to non-existing documentation file, and to mount options marked as obsolete in the module. Signed-off-by: Alexander E. Patrakov <patrakov@ums.usu.ru> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 08:58:15 +02:00
Michael Opdenacker	59c51591a0	Fix occurrences of "the the " Signed-off-by: Michael Opdenacker <michael@free-electrons.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 08:57:56 +02:00
Robert P. J. Day	beb7dd86a1	Fix misspellings collected by members of KJ list. Fix the misspellings of "propogate", "writting" and (oh, the shame :-) "kenrel" in the source tree. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 07:14:03 +02:00
WANG Cong	ccf6780dc3	Style fix in fs/select.c Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 07:10:02 +02:00
Ronni Nielsen	0f8952c2fa	fs/libfs.c: >80 columns line break fix Signed-off-by: Ronni Nielsen <theronni@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 06:44:57 +02:00
David Rientjes	4b8df8915a	smaps: only define clear_refs for CONFIG_MMU /proc/pid/clear_refs is only defined in the CONFIG_MMU case, so make sure we don't have any references to clear_refs_smap() in generic procfs code. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 20:41:14 -07:00
Linus Torvalds	7b82dc0e64	Remove suid/sgid bits on [f]truncate() .. to match what we do on write(). This way, people who write to files by using [f]truncate + writable mmap have the same semantics as if they were using the write() family of system calls. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 20:10:00 -07:00
Linus Torvalds	60c9b2746f	Merge git://oss.sgi.com:8090/xfs/xfs-2.6 * git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] Add lockdep support for XFS [XFS] Fix race in xfs_write() b/w dmapi callout and direct I/O checks. [XFS] Get rid of redundant "required" in msg. [XFS] Export via a function xfs_buftarg_list for use by kdb/xfsidbg. [XFS] Remove unused ilen variable and references. [XFS] Fix to prevent the notorious 'NULL files' problem after a crash. [XFS] Fix race condition in xfs_write(). [XFS] Fix uquota and oquota enforcement problems. [XFS] propogate return codes from flush routines [XFS] Fix quotaon syscall failures for group enforcement requests. [XFS] Invalidate quotacheck when mounting without a quota type. [XFS] reducing the number of random number functions. [XFS] remove more misc. unused args [XFS] the "aendp" arg to xfs_dir2_data_freescan is always NULL, remove it. [XFS] The last argument "lsn" of xfs_trans_commit() is always called with	2007-05-08 11:59:33 -07:00
Linus Torvalds	02a93208ed	Merge branch 'for-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block * 'for-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block: [PATCH] ll_rw_blk: fix missing bounce in blk_rq_map_kern() [PATCH] splice: always call into page_cache_readahead() [PATCH] splice(): fix interaction with readahead	2007-05-08 11:34:52 -07:00
Linus Torvalds	18062a91d2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: JFS: Fix race waking up jfsIO kernel thread JFS: use __set_current_state() Copy i_flags to jfs inode flags on write JFS: document uid, gid, and umask mount options in jfs.txt	2007-05-08 11:32:30 -07:00
Dmitriy Monakhov	951744fea0	udf: possible null pointer dereference while load_partition sb_read may return NULL, let's explicitly check it. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:22 -07:00
Jan Kara	31170b6ad4	udf: support files larger than 1G Make UDF work correctly for files larger than 1GB. As no extent can be longer than (1<<30)-blocksize bytes, we have to create several extents if a big hole is being created. As a side-effect, we now don't discard preallocated blocks when creating a hole. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:21 -07:00
Jan Kara	948b9b2c96	udf: add assertions Add a few assertions into udf_discard_prealloc() to check that the file is sane (mostly helps debugging further patches ;). Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:21 -07:00
Jan Kara	3bf25cb40d	udf: use get_bh() Make UDF use get_bh() instead of directly accessing b_count and use brelse() instead of udf_release_data() which does just brelse()... Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:21 -07:00
Jan Kara	ff116fc8d1	UDF: introduce struct extent_position Introduce a structure extent_position to store a position of an extent and the corresponding buffer_head in one place. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:21 -07:00
Jan Kara	60448b1d6d	udf: use sector_t and loff_t for file offsets Use sector_t and loff_t for file offsets in UDF filesystem. Otherwise an overflow may occur for long files. Also make inode_bmap() return offset in the extent in number of blocks instead of number of bytes - for most callers this is more convenient. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:21 -07:00
Peter Zijlstra	277866a0e3	nfs: fix congestion control: use atomic_longs Change the atomic_t in struct nfs_server to atomic_long_t in anticipation of machines that can handle 8+TB of (4K) pages under writeback. However I suspect other things in NFS will start going bang by then. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:21 -07:00
Ulrich Drepper	1c710c896e	utimensat implementation Implement utimensat(2) which is an extension to futimesat(2) in that it a) supports nano-second resolution for the timestamps b) allows to selectively ignore the atime/mtime value c) allows to selectively use the current time for either atime or mtime d) supports changing the atime/mtime of a symlink itself along the lines of the BSD lutimes(3) functions For this change the internally used do_utimes() functions was changed to accept a timespec time value and an additional flags parameter. Additionally the sys_utime function was changed to match compat_sys_utime which already use do_utimes instead of duplicating the work. Also, the completely missing futimensat() functionality is added. We have such a function in glibc but we have to resort to using /proc/self/fd/* which not everybody likes (chroot etc). Test application (the syscall number will need per-arch editing): #include <errno.h> #include <fcntl.h> #include <time.h> #include <sys/time.h> #include <stddef.h> #include <syscall.h> #define __NR_utimensat 280 #define UTIME_NOW ((1l << 30) - 1l) #define UTIME_OMIT ((1l << 30) - 2l) int main(void) { int status = 0; int fd = open("ttt", O_RDWR\|O_CREAT\|O_EXCL, 0666); if (fd == -1) error (1, errno, "failed to create test file \"ttt\""); struct stat64 st1; if (fstat64 (fd, &st1) != 0) error (1, errno, "fstat failed"); struct timespec t[2]; t[0].tv_sec = 0; t[0].tv_nsec = 0; t[1].tv_sec = 0; t[1].tv_nsec = 0; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); struct stat64 st2; if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != 0 \|\| st2.st_atim.tv_nsec != 0) { puts ("atim not reset to zero"); status = 1; } if (st2.st_mtim.tv_sec != 0 \|\| st2.st_mtim.tv_nsec != 0) { puts ("mtim not reset to zero"); status = 1; } if (status != 0) goto out; t[0] = st1.st_atim; t[1].tv_sec = 0; t[1].tv_nsec = UTIME_OMIT; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != st1.st_atim.tv_sec \|\| st2.st_atim.tv_nsec != st1.st_atim.tv_nsec) { puts ("atim not set"); status = 1; } if (st2.st_mtim.tv_sec != 0 \|\| st2.st_mtim.tv_nsec != 0) { puts ("mtim changed from zero"); status = 1; } if (status != 0) goto out; t[0].tv_sec = 0; t[0].tv_nsec = UTIME_OMIT; t[1] = st1.st_mtim; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != st1.st_atim.tv_sec \|\| st2.st_atim.tv_nsec != st1.st_atim.tv_nsec) { puts ("mtim changed from original time"); status = 1; } if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec \|\| st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec) { puts ("mtim not set"); status = 1; } if (status != 0) goto out; sleep (2); t[0].tv_sec = 0; t[0].tv_nsec = UTIME_NOW; t[1].tv_sec = 0; t[1].tv_nsec = UTIME_NOW; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); struct timeval tv; gettimeofday(&tv,NULL); if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec \|\| st2.st_atim.tv_sec > tv.tv_sec) { puts ("atim not set to NOW"); status = 1; } if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec \|\| st2.st_mtim.tv_sec > tv.tv_sec) { puts ("mtim not set to NOW"); status = 1; } if (symlink ("ttt", "tttsym") != 0) error (1, errno, "cannot create symlink"); t[0].tv_sec = 0; t[0].tv_nsec = 0; t[1].tv_sec = 0; t[1].tv_nsec = 0; if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0) error (1, errno, "utimensat failed"); if (lstat64 ("tttsym", &st2) != 0) error (1, errno, "lstat failed"); if (st2.st_atim.tv_sec != 0 \|\| st2.st_atim.tv_nsec != 0) { puts ("symlink atim not reset to zero"); status = 1; } if (st2.st_mtim.tv_sec != 0 \|\| st2.st_mtim.tv_nsec != 0) { puts ("symlink mtim not reset to zero"); status = 1; } if (status != 0) goto out; t[0].tv_sec = 1; t[0].tv_nsec = 0; t[1].tv_sec = 1; t[1].tv_nsec = 0; if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != 1 \|\| st2.st_atim.tv_nsec != 0) { puts ("atim not reset to one"); status = 1; } if (st2.st_mtim.tv_sec != 1 \|\| st2.st_mtim.tv_nsec != 0) { puts ("mtim not reset to one"); status = 1; } if (status == 0) puts ("all OK"); out: close (fd); unlink ("ttt"); unlink ("tttsym"); return status; } [akpm@linux-foundation.org: add missing i386 syscall table entry] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Alexey Dobriyan <adobriyan@openvz.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:18 -07:00
Jeff Layton	1a1c9bb433	inode numbering: change libfs sb creation routines to avoid collisions with their root inodes This patch makes it so that simple_fill_super and get_sb_pseudo assign their root inodes to be number 1. It also fixes up a couple of callers of simple_fill_super that were passing in files arrays that had an index at number 1, and adds a warning for any caller that sends in such an array. It would have been nice to have made it so that it wasn't possible to make such a collision, but some callers need to be able to control what inode number their entries get, so I think this is the best that can be done. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:16 -07:00
Jeff Layton	866b04fccb	inode numbering: make static counters in new_inode and iunique be 32 bits The problems are: - on filesystems w/o permanent inode numbers, i_ino values can be larger than 32 bits, which can cause problems for some 32 bit userspace programs on a 64 bit kernel. We can't do anything for filesystems that have actual >32-bit inode numbers, but on filesystems that generate i_ino values on the fly, we should try to have them fit in 32 bits. We could trivially fix this by making the static counters in new_inode and iunique 32 bits, but... - many filesystems call new_inode and assume that the i_ino values they are given are unique. They are not guaranteed to be so, since the static counter can wrap. This problem is exacerbated by the fix for #1. - after allocating a new inode, some filesystems call iunique to try to get a unique i_ino value, but they don't actually add their inodes to the hashtable, and so they're still not guaranteed to be unique if that counter wraps. This patch set takes the simpler approach of simply using iunique and hashing the inodes afterward. Christoph H. previously mentioned that he thought that this approach may slow down lookups for filesystems that currently hash their inodes. The questions are: 1) how much would this slow down lookups for these filesystems? 2) is it enough to justify adding more infrastructure to avoid it? What might be best is to start with this approach and then only move to using IDR or some other scheme if these extra inodes in the hashtable prove to be problematic. I've done some cursory testing with this patch and the overhead of hashing and unhashing the inodes with pipefs is pretty low -- just a few seconds of system time added on to the creation and destruction of 10 million pipes (very similar to the overhead that the IDR approach would add). The hard thing to measure is what effect this has on other filesystems. I'm open to ways to try and gauge this. Again, I've only converted pipefs as an example. If this approach is acceptable then I'll start work on patches to convert other filesystems. With a pretty-much-worst-case microbenchmark provided by Eric Dumazet <dada1@cosmosbay.com>: hashing patch (pipebench): sys 1m15.329s sys 1m16.249s sys 1m17.169s unpatched (pipebench): sys 1m9.836s sys 1m12.541s sys 1m14.153s Which works out to 1.05642174294555027017. So ~5-6% slowdown. This patch: When a 32-bit program that was not compiled with large file offsets does a stat and gets a st_ino value back that won't fit in the 32 bit field, glibc (correctly) generates an EOVERFLOW error. We can't do anything about fs's with larger permanent inode numbers, but when we generate them on the fly, we ought to try and have them fit within a 32 bit field. This patch takes the first step toward this by making the static counters in these two functions be 32 bits. [jlayton@redhat.com: mention that it's only the case for 32bit, non-LFS stat] Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:16 -07:00
Alexey Kuznetsov	b140f25108	Invalid return value of execve() resulting in oopses When elf loader fails to map executable (due to memory shortage or because binary is malformed), it can return 0. Normally, this is invisible because process is killed with SIGKILL and it never returns to user space. But if exec() is called from kernel thread (hotplug, whatever) consequences are more interesting and vary depending on architecture. i386. Nothing especially interesting, execve() just returns with "success" :-) x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded with zeros, ergo... double fault. ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it oopses due to return to zero PC, sometimes it sees NaT in rXX and oopses due to NaT consumption. Signed-off-by: Alexey Kuznetsov <alexey@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:15 -07:00
Akinobu Mita	0c28f287aa	procfs: use simple_read_from_buffer() Cleanup using simple_read_from_buffer() in procfs. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:14 -07:00
Andreas Schwab	83ae1b79c8	Fix error handling in HDIO_GETGEO compat wrapper Don't clobber error from sys_ioctl in HDIO_GETGEO compat wrapper. Signed-off-by: Andreas Schwab <schwab@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:14 -07:00
Stephen Mollett	c007c06e3c	udf: decrement correct link count in udf_rmdir It appears that a minor thinko occurred in udf_rmdir and the (already-cleared) link count on the directory that is being removed was being decremented instead of the link count on its parent directory. This gives rise to lots of kernel messages similar to: UDF-fs warning (device loop1): udf_rmdir: empty directory has nlink != 2 (8) when removing directory trees. No other ill effects have been observed but I guess it could theoretically result in the link count overflowing on a very long-lived, much modified directory. Signed-off-by: Stephen Mollett <molletts@yahoo.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Jan Kara <jack@ucw.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:14 -07:00
OGAWA Hirofumi	c483bab099	fat: fix VFAT compat ioctls on 64-bit systems If you compile and run the below test case in an msdos or vfat directory on an x86-64 system with -m32 you'll get garbage in the kernel_dirent struct followed by a SIGSEGV. The patch fixes this. Reported and initial fix by Bart Oldeman #include <sys/types.h> #include <sys/ioctl.h> #include <dirent.h> #include <stdio.h> #include <unistd.h> #include <fcntl.h> struct kernel_dirent { long d_ino; long d_off; unsigned short d_reclen; char d_name[256]; /* We must not include limits.h! */ }; #define VFAT_IOCTL_READDIR_BOTH _IOR('r', 1, struct kernel_dirent [2]) #define VFAT_IOCTL_READDIR_SHORT _IOR('r', 2, struct kernel_dirent [2]) int main(void) { int fd = open(".", O_RDONLY); struct kernel_dirent de[2]; while (1) { int i = ioctl(fd, VFAT_IOCTL_READDIR_BOTH, (long)de); if (i == -1) break; if (de[0].d_reclen == 0) break; printf("SFN: reclen=%2d off=%d ino=%d, %-12s", de[0].d_reclen, de[0].d_off, de[0].d_ino, de[0].d_name); if (de[1].d_reclen) printf("\tLFN: reclen=%2d off=%d ino=%d, %s", de[1].d_reclen, de[1].d_off, de[1].d_ino, de[1].d_name); printf("\n"); } return 0; } Signed-off-by: Bart Oldeman <bartoldeman@users.sourceforge.net> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:14 -07:00
Jan Kara	4f99ed67cc	ext3: copy i_flags to inode flags on write Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into ext2-specific i_flags. Hence, when someone sets these flags via a different interface than ioctl, they are stored correctly. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:13 -07:00
OGAWA Hirofumi	28ec039c21	fat: don't use free_clusters for fat32 It seems that the recent Windows changed specification, and it's undocumented. Windows doesn't update ->free_clusters correctly. This patch doesn't use ->free_clusters by default. (instead, add "usefree" for forcing to use it) Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Juergen Beisert <juergen127@kreuzholzen.de> Cc: Andreas Schwab <schwab@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:13 -07:00
Milind Arun Choudhary	5ab2f7e0fd	reiserfs: use __set_current_state() use __set_current_state(TASK_) instead of current->state = TASK_, in fs/reiserfs Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:13 -07:00
Pavel Emelianov	97f0678467	jbd: check for error returned by kthread_create on creating journal thread If the thread failed to create the subsequent wait_event will hang forever. This is likely to happen if kernel hits max_threads limit. Will be critical for virtualization systems that limit the number of tasks and kernel memory usage within the container. (akpm: JBD should be converted fully to the kthread API: kthread_should_stop() and kthread_stop()). Cc: <linux-ext4@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:13 -07:00
Miklos Szeredi	ee6f958291	check privileges before setting mount propagation There's a missing check for CAP_SYS_ADMIN in do_change_type(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:12 -07:00
Jan Kara	28be5abb40	ext3: copy i_flags to inode flags on write A patch that stores inode flags such as S_IMMUTABLE, S_APPEND, etc. from i_flags to EXT3_I(inode)->i_flags when inode is written to disk. The same thing is done on GETFLAGS ioctl. Quota code changes these flags on quota files (to make it harder for sysadmin to screw himself) and these changes were not correctly propagated into the filesystem (especially, lsattr did not show them and users were wondering...). Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into ext3-specific i_flags. Hence, when someone sets these flags via a different interface than ioctl, they are stored correctly. Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:12 -07:00
Pavel Emelianov	b5e618181a	Introduce a handy list_first_entry macro There are many places in the kernel where the construction like foo = list_entry(head->next, struct foo_struct, list); are used. The code might look more descriptive and neat if using the macro list_first_entry(head, type, member) \ list_entry((head)->next, type, member) Here is the macro itself and the examples of its usage in the generic code. If it will turn out to be useful, I can prepare the set of patches to inject in into arch-specific code, drivers, networking, etc. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:11 -07:00
Eric W. Biederman	1bd0cf1fc7	smbfs: remove unnecessary allow_signal Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:11 -07:00
Jeffrey Layton	3361c7bebb	make iunique use a do/while loop rather than its obscure goto loop A while back, Christoph mentioned that he thought that iunique ought to be cleaned up to use a more conventional loop construct. This patch does that, turning the strange goto loop into a do/while. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:10 -07:00
John Johansen	9d0633cfed	Remove redundant check from proc_sys_setattr() notify_change() already calls security_inode_setattr() before calling iop->setattr. Alan sayeth This is a behaviour change on all of these and limits some behaviour of existing established security modules When inode_change_ok is called it has side effects. This includes clearing the SGID bit on attribute changes caused by chmod. If you make this change the results of some rulesets may be different before or after the change is made. I'm not saying the change is wrong but it does change behaviour so that needs looking at closely (ditto all other attribute twiddles) Signed-off-by: Steve Beattie <sbeattie@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: John Johansen <jjohansen@suse.de> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:10 -07:00
John Johansen	1e8123fded	Remove redundant check from proc_setattr() notify_change() already calls security_inode_setattr() before calling iop->setattr. Signed-off-by: Tony Jones <tonyj@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: John Johansen <jjohansen@suse.de> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:10 -07:00
Martin Peschke	09f0892ec7	proc: cleanup: use seq_release_private() where appropriate We can save some lines of code by using seq_release_private(). Signed-off-by: Martin Peschke <mp3@de.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:09 -07:00
Christoph Hellwig	6272e26679	cleanup compat ioctl handling Merge all compat ioctl handling into compat_ioctl.c instead of splitting it over compat.c and compat_ioctl.c. This also allows to get rid of ioctl32.h Signed-off-by: Christoph Hellwig <hch@lst.de> Looks-good-to: Andi Kleen <ak@suse.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:09 -07:00
Philippe De Muyter	19d0e8ce85	partition: add support for sysv68 partitions Add support for the Motorola sysv68 disk partition (slices in motorola doc). Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:09 -07:00
Christoph Hellwig	644fd4f5de	merge compat_ioctl.h into compat_ioctl.c Now that there is no arch-specific compat ioctl handling left there is not point in having a separate copat_ioctl.h, so merge it into compat_ioctl.c Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:09 -07:00
Milind Arun Choudhary	1525dccbc2	ROUND_UP macro cleanup in fs/smbfs/request.c ROUND_UP macro cleanup use ALIGN Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:09 -07:00
Milind Arun Choudhary	022a169244	ROUND_UP macro cleanup in fs/(select\|compat\|readdir).c ROUND_UP macro cleanup use,ALIGN or DIV_ROUND_UP where ever appropriate. Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:09 -07:00
Alexey Dobriyan	7e80d0d0b6	i386: sched.h inclusion from module.h is baack linux/module.h -> linux/elf.h -> asm-i386/elf.h -> linux/utsname.h -> linux/sched.h Noticeably cut the number of files which are rebuild upon touching sched.h and cut down pulled junk from every module.h inclusion. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:08 -07:00
Alexey Dobriyan	9d65cb4a17	Fix race between cat /proc//wchan and rmmod et al kallsyms_lookup() can go iterating over modules list unprotected which is OK for emergency situations (oops), but not OK for regular stuff like /proc//wchan. Introduce lookup_symbol_name()/lookup_module_symbol_name() which copy symbol name into caller-supplied buffer or return -ERANGE. All copying is done with module_mutex held, so... Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:08 -07:00
Alexey Dobriyan	ffb4512276	Simplify kallsyms_lookup() Several kallsyms_lookup() pass dummy arguments but only need, say, module's name. Make kallsyms_lookup() accept NULLs where possible. Also, makes picture clearer about what interfaces are needed for all symbol resolving business. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:08 -07:00
kalash nainwal	98701d1b0f	(re)register_binfmt returns with -EBUSY When a binary format is unregistered and re-registered, register_binfmt fails with -EBUSY. The reason is that unregister_binfmt does not set fmt->next to NULL, and seeing (fmt->next != NULL), register_binfmt fails with -EBUSY. One can find his way around by explicitly setting fmt->next to NULL after unregistering, but that is kind of unclean (one should better be using only the interfaces, and not the interal members, isn't it?) Attached one-liner can fix it. Signed-off-by: Kalash Nainwal <kalash.nainwal@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:08 -07:00
Randy Dunlap	e63340ae6b	header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:07 -07:00
Adrian Bunk	e5f00f42f3	make remove_inode_dquot_ref() static remove_inode_dquot_ref() can now become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:05 -07:00
Alexey Dobriyan	ca509f69de	Protect tty drivers list with tty_mutex Additions and removal from tty_drivers list were just done as well as iterating on it for /proc/tty/drivers generation. testing: modprobe/rmmod loop of simple module which does nothing but tty_register_driver() vs cat /proc/tty/drivers loop BUG: unable to handle kernel paging request at virtual address 6b6b6b6b printing eip: c01cefa7 *pde = 00000000 Oops: 0000 [#1] PREEMPT last sysfs file: devices/pci0000:00/0000:00:1d.7/usb5/5-0:1.0/bInterfaceProtocol Modules linked in: ohci_hcd af_packet e1000 ehci_hcd uhci_hcd usbcore xfs CPU: 0 EIP: 0060:[<c01cefa7>] Not tainted VLI EFLAGS: 00010297 (2.6.21-rc4-mm1 #4) EIP is at vsnprintf+0x3a4/0x5fc eax: 6b6b6b6b ebx: f6cb50f2 ecx: 6b6b6b6b edx: fffffffe esi: c0354700 edi: f6cb6000 ebp: 6b6b6b6b esp: f31f5e68 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process cat (pid: 31864, ti=f31f4000 task=c1998030 task.ti=f31f4000) Stack: 00000000 c0103f20 c013003a c0103f20 00000000 f6cb50da 0000000a 00000f0e f6cb50f2 00000010 00000014 ffffffff ffffffff 00000007 c0354753 f6cb50f2 f73e39dc f73e39dc 00000001 c0175416 f31f5ed8 f31f5ed4 0ee00000 f32090bc Call Trace: [<c0103f20>] restore_nocheck+0x12/0x15 [<c013003a>] mark_held_locks+0x6d/0x86 [<c0103f20>] restore_nocheck+0x12/0x15 [<c0175416>] seq_printf+0x2e/0x52 [<c0192895>] show_tty_range+0x35/0x1f3 [<c0175416>] seq_printf+0x2e/0x52 [<c0192add>] show_tty_driver+0x8a/0x1d9 [<c01758f6>] seq_read+0x70/0x2ba [<c0175886>] seq_read+0x0/0x2ba [<c018d8e6>] proc_reg_read+0x63/0x9f [<c015e764>] vfs_read+0x7d/0xb5 [<c018d883>] proc_reg_read+0x0/0x9f [<c015eab1>] sys_read+0x41/0x6a [<c0103e4e>] sysenter_past_esp+0x5f/0x99 ======================= Code: 00 8b 4d 04 e9 44 ff ff ff 8d 4d 04 89 4c 24 50 8b 6d 00 81 fd ff 0f 00 00 b8 a4 c1 35 c0 0f 46 e8 8b 54 24 2c 89 e9 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 89 c6 8b 44 24 28 89 EIP: [<c01cefa7>] vsnprintf+0x3a4/0x5fc SS:ESP 0068:f31f5e68 Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:05 -07:00
Mark Fasheh	ef51c97623	Remove do_sync_file_range() Remove do_sync_file_range() and convert callers to just use do_sync_mapping_range(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:04 -07:00
Randy Dunlap	880ebdc516	reiserfs: proc support requires PROC_FS REISER_FS /proc option needs to depend on PROC_FS. fs/reiserfs/procfs.c: In function 'show_super': fs/reiserfs/procfs.c:134: error: 'reiserfs_proc_info_data_t' has no member named 'max_hash_collisions' fs/reiserfs/procfs.c:134: error: 'reiserfs_proc_info_data_t' has no member named 'breads' fs/reiserfs/procfs.c:135: error: 'reiserfs_proc_info_data_t' has no member named 'bread_miss' fs/reiserfs/procfs.c:135: error: 'reiserfs_proc_info_data_t' has no member named 'search_by_key' fs/reiserfs/procfs.c:136: error: 'reiserfs_proc_info_data_t' has no member named 'search_by_key_fs_changed' fs/reiserfs/procfs.c:136: error: 'reiserfs_proc_info_data_t' has no member named 'search_by_key_restarted' fs/reiserfs/procfs.c:137: error: 'reiserfs_proc_info_data_t' has no member named 'insert_item_restarted' fs/reiserfs/procfs.c:137: error: 'reiserfs_proc_info_data_t' has no member named 'paste_into_item_restarted' fs/reiserfs/procfs.c:138: error: 'reiserfs_proc_info_data_t' has no member named 'cut_from_item_restarted' fs/reiserfs/procfs.c:139: error: 'reiserfs_proc_info_data_t' has no member named 'delete_solid_item_restarted' fs/reiserfs/procfs.c:139: error: 'reiserfs_proc_info_data_t' has no member named 'delete_item_restarted' fs/reiserfs/procfs.c:140: error: 'reiserfs_proc_info_data_t' has no member named 'leaked_oid' fs/reiserfs/procfs.c:140: error: 'reiserfs_proc_info_data_t' has no member named 'leaves_removable' fs/reiserfs/procfs.c: In function 'show_per_level': fs/reiserfs/procfs.c:184: error: 'reiserfs_proc_info_data_t' has no member named 'balance_at' fs/reiserfs/procfs.c:185: error: 'reiserfs_proc_info_data_t' has no member named 'sbk_read_at' fs/reiserfs/procfs.c:186: error: 'reiserfs_proc_info_data_t' has no member named 'sbk_fs_changed' fs/reiserfs/procfs.c:187: error: 'reiserfs_proc_info_data_t' has no member named 'sbk_restarted' fs/reiserfs/procfs.c:188: error: 'reiserfs_proc_info_data_t' has no member named 'free_at' fs/reiserfs/procfs.c:189: error: 'reiserfs_proc_info_data_t' has no member named 'items_at' fs/reiserfs/procfs.c:190: error: 'reiserfs_proc_info_data_t' has no member named 'can_node_be_removed' fs/reiserfs/procfs.c:191: error: 'reiserfs_proc_info_data_t' has no member named 'lnum' fs/reiserfs/procfs.c:192: error: 'reiserfs_proc_info_data_t' has no member named 'rnum' fs/reiserfs/procfs.c:193: error: 'reiserfs_proc_info_data_t' has no member named 'lbytes' fs/reiserfs/procfs.c:194: error: 'reiserfs_proc_info_data_t' has no member named 'rbytes' fs/reiserfs/procfs.c:195: error: 'reiserfs_proc_info_data_t' has no member named 'get_neighbors' fs/reiserfs/procfs.c:196: error: 'reiserfs_proc_info_data_t' has no member named 'get_neighbors_restart' fs/reiserfs/procfs.c:197: error: 'reiserfs_proc_info_data_t' has no member named 'need_l_neighbor' fs/reiserfs/procfs.c:197: error: 'reiserfs_proc_info_data_t' has no member named 'need_r_neighbor' fs/reiserfs/procfs.c: In function 'show_bitmap': fs/reiserfs/procfs.c:224: error: 'reiserfs_proc_info_data_t' has no member named 'free_block' fs/reiserfs/procfs.c:225: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap' fs/reiserfs/procfs.c:226: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap' fs/reiserfs/procfs.c:227: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap' fs/reiserfs/procfs.c:228: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap' fs/reiserfs/procfs.c:229: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap' fs/reiserfs/procfs.c:230: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap' fs/reiserfs/procfs.c:230: error: 'reiserfs_proc_info_data_t' has no member named 'scan_bitmap' fs/reiserfs/procfs.c: In function 'show_journal': fs/reiserfs/procfs.c:384: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:385: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:386: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:387: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:388: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:389: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:390: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:391: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:392: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:393: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:394: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:395: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:395: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c:395: error: 'reiserfs_proc_info_data_t' has no member named 'journal' fs/reiserfs/procfs.c: In function 'reiserfs_proc_info_init': fs/reiserfs/procfs.c:504: warning: implicit declaration of function '__PINFO' fs/reiserfs/procfs.c:504: error: request for member 'lock' in something not a structure or union fs/reiserfs/procfs.c: In function 'reiserfs_proc_info_done': fs/reiserfs/procfs.c:544: error: request for member 'lock' in something not a structure or union fs/reiserfs/procfs.c:545: error: request for member 'exiting' in something not a structure or union fs/reiserfs/procfs.c:546: error: request for member 'lock' in something not a structure or union Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:04 -07:00
Alexey Dobriyan	19c5d45a09	/proc/*/oom_score oops re badness Eternal quest to make while true; do cat /proc/fs/xfs/stat >/dev/null 2>/dev/null; done while true; do find /proc -type f 2>/dev/null \| xargs cat >/dev/null 2>/dev/null; done while true; do modprobe xfs; rmmod xfs; done work reliably continues and now kernel oopses in the following way: BUG: unable to handle ... at virtual address 6b6b6b6b EIP is at badness process: cat proc_oom_score proc_info_read sys_fstat64 vfs_read proc_info_read sys_read Failing code is prefetch hidden in list_for_each_entry() in badness(). badness() is reachable from two points. One is proc_oom_score, another is out_of_memory() => select_bad_process() => badness(). Second path grabs tasklist_lock, while first doesn't. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:04 -07:00
Eric Dumazet	c23fbb6bcb	VFS: delay the dentry name generation on sockets and pipes 1) Introduces a new method in 'struct dentry_operations'. This method called d_dname() might be called from d_path() to build a pathname for special filesystems. It is called without locks. Future patches (if we succeed in having one common dentry for all pipes/sockets) may need to change prototype of this method, but we now use : char d_dname(struct dentry dentry, char buffer, int buflen); 2) Adds a dynamic_dname() helper function that eases d_dname() implementations 3) Defines d_dname method for sockets : No more sprintf() at socket creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... 4) Defines d_dname method for pipes : No more sprintf() at pipe creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a nice* speedup on my Pentium(M) 1.6 Ghz : 3.090 s instead of 3.450 s Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Christoph Hellwig <hch@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:03 -07:00
Miklos Szeredi	2793274298	add file position info to proc Add support for finding out the current file position, open flags and possibly other info in the future. These new entries are added: /proc/PID/fdinfo/FD /proc/PID/task/TID/fdinfo/FD For each fd the information is provided in the following format: pos: 1234 flags: 0100002 [bunk@stusta.de: make struct proc_fdinfo_file_operations static] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:03 -07:00
Eric Dumazet	c5141e6d64	procfs: reorder struct pid_dentry to save space on 64bit archs, and constify them Change the order of fields of struct pid_entry (file fs/proc/base.c) in order to avoid a hole on 64bit archs. (8 bytes saved per object) Also change all pid_entry arrays to be const qualified, to make clear they must not be modified. Before (on x86_64) : # size fs/proc/base.o text data bss dec hex filename 15549 2192 0 17741 454d fs/proc/base.o After : # size fs/proc/base.o text data bss dec hex filename 17229 176 0 17405 43fd fs/proc/base.o Thats 336 bytes saved on kernel size on x86_64 Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:03 -07:00
Kees Cook	5096add84b	proc: maps protection The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive information about the memory location and usage of processes. Issues: - maps should not be world-readable, especially if programs expect any kind of ASLR protection from local attackers. - maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc check the maps when %n is in a *printf call, and a setuid(getuid()) process wouldn't be able to read its own maps file. (For reference see http://lkml.org/lkml/2006/1/22/150) - a system-wide toggle is needed to allow prior behavior in the case of non-root applications that depend on access to the maps contents. This change implements a check using "ptrace_may_attach" before allowing access to read the maps contents. To control this protection, the new knob /proc/sys/kernel/maps_protect has been added, with corresponding updates to the procfs documentation. [akpm@linux-foundation.org: build fixes] [akpm@linux-foundation.org: New sysctl numbers are old hat] Signed-off-by: Kees Cook <kees@outflux.net> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:02 -07:00
Christoph Hellwig	5843205b55	namei.c: remove utterly outdated comment We don't have a routine called namei() anymore since at least 2.3.x, and the comment is just totally out of sync with the current lookup logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:02 -07:00
Christoph Hellwig	acb0c854fa	vfs: remove superflous sb == NULL checks inode->i_sb is always set, not need to check for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:02 -07:00
Alexey Dobriyan	578c8183c1	proc: remove pathetic ->deleted WARN_ON WARN_ON(de && de->deleted); is sooo unreliable. Why? proc_lookup remove_proc_entry =========== ================= lock_kernel(); spin_lock(&proc_subdir_lock); [find proc entry] spin_unlock(&proc_subdir_lock); spin_lock(&proc_subdir_lock); [find proc entry] proc_get_inode ============== WARN_ON(de && de->deleted); ... if (!atomic_read(&de->count)) free_proc_entry(de); else de->deleted = 1; So, if you have some strange oops [1], and doesn't see this WARN_ON it means nothing. [1] try_module_get() of module which doesn't exist, two lines below should suffice, or not? Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:02 -07:00
Darrick J. Wong	59cd0cbc75	Fix race between proc_readdir and remove_proc_entry Fix the following race: proc_readdir remove_proc_entry ============ ================= spin_lock(&proc_subdir_lock); [choose PDE to start filldir from] spin_unlock(&proc_subdir_lock); spin_lock(&proc_subdir_lock); [find PDE] [free PDE, refcount is 0] spin_unlock(&proc_subdir_lock); /* boom */ if (filldir(dirent, de->name, ... [de_put on error path --adobriyan] Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:02 -07:00
Alexey Dobriyan	7695650a92	Fix race between proc_get_inode() and remove_proc_entry() proc_lookup remove_proc_entry =========== ================= lock_kernel(); spin_lock(&proc_subdir_lock); [find PDE with refcount 0] spin_unlock(&proc_subdir_lock); spin_lock(&proc_subdir_lock); [find PDE with refcount 0] [check refcount and free PDE] spin_unlock(&proc_subdir_lock); proc_get_inode: de_get(de); /* boom */ Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:01 -07:00
Miklos Szeredi	79c0b2df79	add filesystem subtype support There's a slight problem with filesystem type representation in fuse based filesystems. From the kernel's view, there are just two filesystem types: fuse and fuseblk. From the user's view there are lots of different filesystem types. The user is not even much concerned if the filesystem is fuse based or not. So there's a conflict of interest in how this should be represented in fstab, mtab and /proc/mounts. The current scheme is to encode the real filesystem type in the mount source. So an sshfs mount looks like this: sshfs#user@server:/ /mnt/server fuse rw,nosuid,nodev,... This url-ish syntax works OK for sshfs and similar filesystems. However for block device based filesystems (ntfs-3g, zfs) it doesn't work, since the kernel expects the mount source to be a real device name. A possibly better scheme would be to encode the real type in the type field as "type.subtype". So fuse mounts would look like this: /dev/hda1 /mnt/windows fuseblk.ntfs-3g rw,... user@server:/ /mnt/server fuse.sshfs rw,nosuid,nodev,... This patch adds the necessary code to the kernel so that this can be correctly displayed in /proc/mounts. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:01 -07:00
Davide Libenzi	6192bd536f	epoll: optimizations and cleanups Epoll is doing multiple passes over the ready set at the moment, because of the constraints over the f_op->poll() call. Looking at the code again, I noticed that we already hold the epoll semaphore in read, and this (together with other locking conditions that hold while doing an epoll_wait()) can lead to a smarter way [1] to "ship" events to userspace (in a single pass). This is a stress application that can be used to test the new code. It spwans multiple thread and call epoll_wait() and epoll_ctl() from many threads. Stress tested on my dual Opteron 254 w/out any problems. http://www.xmailserver.org/totalmess.c This is not a benchmark, just something that tries to stress and exploit possible problems with the new code. Also, I made a stupid micro-benchmark: http://www.xmailserver.org/epwbench.c [1] Considering that epoll must be thread-safe, there are five ways we can be hit during an epoll_wait() transfer loop (ep_send_events()): 1) The epoll fd going away and calling ep_free This just can't happen, since we did an fget() in sys_epoll_wait 2) An epoll_ctl(EPOLL_CTL_DEL) This can't happen because epoll_ctl() gets ep->sem in write, and we're holding it in read during ep_send_events() 3) An fd stored inside the epoll fd going away This can't happen because in eventpoll_release_file() we get ep->sem in write, and we're holding it in read during ep_send_events() 4) Another epoll_wait() happening on another thread They both can be inside ep_send_events() at the same time, we get (splice) the ready-list under the spinlock, so each one will get its own ready list. Note that an fd cannot be at the same time inside more than one ready list, because ep_poll_callback() will not re-queue it if it sees it already linked: if (ep_is_linked(&epi->rdllink)) goto is_linked; Another case that can happen, is two concurrent epoll_wait(), coming in with a userspace event buffer of size, say, ten. Suppose there are 50 event ready in the list. The first epoll_wait() will "steal" the whole list, while the second, seeing no events, will go to sleep. But at the end of ep_send_events() in the first epoll_wait(), we will re-inject surplus ready fds, and we will trigger the proper wake_up to the second epoll_wait(). 5) ep_poll_callback() hitting us asyncronously This is the tricky part. As I said above, the ep_is_linked() test done inside ep_poll_callback(), will guarantee us that until the item will result linked to a list, ep_poll_callback() will not try to re-queue it again (read, write data on any of its members). When we do a list_del() in ep_send_events(), the item will still satisfy the ep_is_linked() test (whatever data is written in prev/next, it'll never be its own pointer), so ep_poll_callback() will still leave us alone. It's only after the eventual smp_mb()+INIT_LIST_HEAD(&epi->rdllink) that it'll become visible to ep_poll_callback(), but at the point we're already past it. [akpm@osdl.org: 80 cols] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:01 -07:00
Dmitriy Monakhov	fedee54d8f	ext3: dirindex error pointer issues - ext3_dx_find_entry() exit with out setting proper error pointer - do_split() exit with out setting proper error pointer it is realy painful because many callers contain folowing code: de = do_split(handle,dir, &bh, frame, &hinfo, &retval); if (!(de)) return retval; <<< WOW retval wasn't changed by do_split(), so caller failed <<< but return SUCCESS :) - Rearrange do_split() error path. Current error path is realy ugly, all this up and down jump stuff doesn't make code easy to understand. [dmonakhov@sw.ru: fix annoying fake error messages] Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org> Cc: Andreas Dilger <adilger@clusterfs.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:01 -07:00
Badari Pulavarty	e3222c4ecc	Merge sys_clone()/sys_unshare() nsproxy and namespace handling sys_clone() and sys_unshare() both makes copies of nsproxy and its associated namespaces. But they have different code paths. This patch merges all the nsproxy and its associated namespace copy/clone handling (as much as possible). Posted on container list earlier for feedback. - Create a new nsproxy and its associated namespaces and pass it back to caller to attach it to right process. - Changed all copy__ns() routines to return a new copy of namespace instead of attaching it to task->nsproxy. - Moved the CAP_SYS_ADMIN checks out of copy__ns() routines. - Removed unnessary !ns checks from copy__ns() and added BUG_ON() just incase. - Get rid of all individual unshare__ns() routines and make use of copy_*_ns() instead. [akpm@osdl.org: cleanups, warning fix] [clg@fr.ibm.com: remove dup_namespaces() declaration] [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval] [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <containers@lists.osdl.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:00 -07:00
Nick Piggin	4fc75ff481	exec: fix remove_arg_zero Petr Tesarik discovered a problem in remove_arg_zero(). He writes: When a script is loaded, load_script() replaces argv[0] with the name of the interpreter and the filename passed to the exec syscall. However, there is no guarantee that the length of the interpreter name plus the length of the filename is greater than the length of the original argv[0]. If the difference happens to cross a page boundary, setup_arg_pages() will call put_dirty_page() [aka install_arg_page()] with an address outside the VMA. Therefore, remove_arg_zero() must free all pages which would be unused after the argument is removed. So, rewrite the remove_arg_zero function without gotos, with a few comments, and with the commonly used explicit index/offset. This fixes the problem and makes it easier to understand as well. [a.p.zijlstra@chello.nl: add comment] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Petr Tesarik <ptesarik@suse.cz> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:00 -07:00
Robert P. J. Day	f87367a6b1	reiserfs: correct misspelled "REISERFS_PROC_INFO" to "CONFIG_REISERFS_PROC_INFO" Correct the misspelling of the preprocessor check of a Kconfig option to refer to CONFIG_REISERFS_PROC_INFO and not just the incorrect REISERFS_PROC_INFO. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:00 -07:00
Alexey Dobriyan	fe08a9d498	reiserfs: shrink superblock if no xattrs This makes in-core superblock fit into one cacheline here. Before: struct dentry * xattr_root; /* 124 4 / / --- cacheline 1 boundary (128 bytes) --- / struct rw_semaphore xattr_dir_sem; / 128 12 / int j_errno; / 140 4 / }; / size: 144, cachelines: 2 / / sum members: 142, holes: 1, sum holes: 2 / / last cacheline: 16 bytes / After: int j_errno; / 124 4 / / --- cacheline 1 boundary (128 bytes) --- / }; / size: 128, cachelines: 1 / / sum members: 126, holes: 1, sum holes: 2 */ Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:00 -07:00
Dmitriy Monakhov	2d3466a348	reiserfs: possible null pointer dereference during resize sb_read may return NULL, let's explicitly check it. If so free new bitmap blocks array, after this we may safely exit as it done above during bitmap allocation. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:59 -07:00
Dmitriy Monakhov	82f703bb8c	freevxfs: possible null pointer dereference fix sb_read may return NULL, so let's explicitly check it. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:59 -07:00
Vignesh Babu BM	1368c4f248	is_power_of_2 in fs/block_dev.c Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2 Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:59 -07:00
Vignesh Babu BM	e1b5c1d3da	is_power_of_2 in fs/hfs Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2 Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:59 -07:00
Vignesh Babu BM	e7d709c096	is_power_of_2 in fat Replacing (n & (n-1)) in the context of power of 2 checks with is_power_of_2 Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:59 -07:00
Florin Malita	3972b7f67b	devpts: add fsnotify create event Currently, devpts doesn't generate an fsnotify event upon pts creation because the regular vfs paths aren't involved. Deallocation, on the other hand, correctly generates a nameremove event thanks to the d_delete() invocation in devpts_pty_kill(). This patch adds the missing fsnotify_create() trigger in devpts_pty_new(). Signed-off-by: Florin Malita <fmalita@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:59 -07:00
Chris Snook	1ae7075bcd	use use SEEK_MAX to validate user lseek arguments Add SEEK_MAX and use it to validate lseek arguments from userspace. Signed-off-by: Chris Snook <csnook@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:59 -07:00
Chris Snook	7b8e89249b	use symbolic constants in generic lseek code Convert magic numbers to SEEK_* values from fs.h Signed-off-by: Chris Snook <csnook@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:59 -07:00
Andrew Morton	24c32d733d	mm: shrink parent dentries when shrinking slab Teach the dentry slab shrinker to aggressively shrink parent dentries when shrinking the dentry cache. This is done to attempt to improve the situation where the dentry slab cache gets a lot of internal fragmentation due to pages containing directory dentries. It is expected that this change will cause some of those dentries to be reaped earlier, and with less scanning. Needs careful testing. Cc: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:58 -07:00
Miklos Szeredi	d52b908646	fix quadratic behavior of shrink_dcache_parent() The time shrink_dcache_parent() takes, grows quadratically with the depth of the tree under 'parent'. This starts to get noticable at about 10,000. These kinds of depths don't occur normally, and filesystems which invoke shrink_dcache_parent() via d_invalidate() seem to have other depth dependent timings, so it's not even easy to expose this problem. However with FUSE it's easy to create a deep tree and d_invalidate() will also get called. This can make a syscall hang for a very long time. This is the original discovery of the problem by Russ Cox: http://article.gmane.org/gmane.comp.file-systems.fuse.devel/3826 The following patch fixes the quadratic behavior, by optionally allowing prune_dcache() to prune ancestors of a dentry in one go, instead of doing it one at a time. Common code in dput() and prune_one_dentry() is extracted into a new helper function d_kill(). shrink_dcache_parent() as well as shrink_dcache_sb() are converted to use the ancestry-pruner option. Only for shrink_dcache_memory() is this behavior not desirable, so it keeps using the old algorithm. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Maneesh Soni <maneesh@in.ibm.com> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:58 -07:00
William Cohen	97dc32cdb1	reduce size of task_struct on 64-bit machines This past week I was playing around with that pahole tool (http://oops.ghostprotocols.net:81/acme/dwarves/) and looking at the size of various struct in the kernel. I was surprised by the size of the task_struct on x86_64, approaching 4K. I looked through the fields in task_struct and found that a number of them were declared as "unsigned long" rather than "unsigned int" despite them appearing okay as 32-bit sized fields. On x86_64 "unsigned long" ends up being 8 bytes in size and forces 8 byte alignment. Is there a reason there a reason they are "unsigned long"? The patch below drops the size of the struct from 3808 bytes (60 64-byte cachelines) to 3760 bytes (59 64-byte cachelines). A couple other fields in the task struct take a signficant amount of space: struct thread_struct thread; 688 struct held_lock held_locks[30]; 1680 CONFIG_LOCKDEP is turned on in the .config [akpm@linux-foundation.org: fix printk warnings] Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:58 -07:00
Markus Rechberger	4d7bf11d64	ext2/3/4: fix file date underflow on ext2 3 filesystems on 64 bit systems Taken from http://bugzilla.kernel.org/show_bug.cgi?id=5079 signed long ranges from -2.147.483.648 to 2.147.483.647 on x86 32bit 10000011110110100100111110111101 .. -2,082,844,739 10000011110110100100111110111101 .. 2,212,122,557 <- this currently gets stored on the disk but when converting it to a 64bit signed long value it loses its sign and becomes positive. Cc: Andreas Dilger <adilger@dilger.ca> Cc: <linux-ext4@vger.kernel.org> Andreas says: This patch is now treating timestamps with the high bit set as negative times (before Jan 1, 1970). This means we lose 1/2 of the possible range of timestamps (lopping off 68 years before unix timestamp overflow - now only 30 years away :-) to handle the extremely rare case of setting timestamps into the distant past. If we are only interested in fixing the underflow case, we could just limit the values to 0 instead of storing negative values. At worst this will skew the timestamp by a few hours for timezones in the far east (files would still show Jan 1, 1970 in "ls -l" output). That said, it seems 32-bit systems (mine at least) allow files to be set into the past (01/01/1907 works fine) so it seems this patch is bringing the x86_64 behaviour into sync with other kernels. On the plus side, we have a patch that is ready to add nanosecond timestamps to ext3 and as an added bonus adds 2 high bits to the on-disk timestamp so this extends the maximum date to 2242. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:58 -07:00
Alexey Dobriyan	8948e11f45	Allow access to /proc/$PID/fd after setuid() /proc/$PID/fd has r-x------ permissions, so if process does setuid(), it will not be able to access /proc/*/fd/. This breaks fstatat() emulation in glibc. open("foo", O_RDONLY\|O_DIRECTORY) = 4 setuid32(65534) = 0 stat64("/proc/self/fd/4/bar", 0xbfafb298) = -1 EACCES (Permission denied) Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Acked-By: Kirill Korotaev <dev@openvz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:58 -07:00
Andrew Morton	7e4c3690b0	block_write_full_page(): report ENOSPC block_write_full_page() forgot to propagate ENPSOC into the address_space. Cc: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:57 -07:00
Guillaume Chazarain	3e9f45bd18	Factor outstanding I/O error handling Cleanup: setting an outstanding error on a mapping was open coded too many times. Factor it out in mapping_set_error(). Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:57 -07:00
Jeff Dike	f1adc05e77	uml: hostfs style fixes hostfs needed some style goodness. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:57 -07:00
Alberto Bertogli	5822b7faca	uml: make hostfs_setattr() support operations on unlinked open files This patch allows hostfs_setattr() to work on unlinked open files by calling set_attr() (the userspace part) with the inode's fd. Without this, applications that depend on doing attribute changes to unlinked open files will fail. It works by using the fd versions instead of the path ones (for example fchmod() instead of chmod(), fchown() instead of chown()) when an fd is available. Signed-off-by: Alberto Bertogli <albertito@gmail.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:57 -07:00
Dmitriy Monakhov	0ceb331433	mm: move common segment checks to separate helper function [akpm@linux-foundation.org: cleanup] Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org> Cc: Christoph Hellwig <hch@lst.de> Acked-by: Anton Altaparmakov <aia21@cam.ac.uk> Acked-by: David Chinner <dgc@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:14:57 -07:00
Jens Axboe	86aa5ac53e	[PATCH] splice: always call into page_cache_readahead() Don't try to guess what the read-ahead logic will do, allow it to make its own decisions. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-05-08 08:46:19 +02:00
Fengguang Wu	9ae9d68cbf	[PATCH] splice(): fix interaction with readahead Eric Dumazet, thank you for disclosing this bug. Readahead logic somehow fails to populate the page range with data. It can be because 1) the readahead routine is not always called in the following lines of fs/splice.c: if (!loff \|\| nr_pages > 1) page_cache_readahead(mapping, &in->f_ra, in, index, nr_pages); 2) even called, page_cache_readahead() wont guarantee the pages are there. It wont submit readahead I/O for pages already in the radix tree, or when (ra_pages == 0), or after 256 cache hits. In your case, it should be because of the retried reads, which lead to excessive cache hits, and disables readahead at some time. And that _one_ failure of readahead blocks the whole read process. The application receives EAGAIN and retries the read, but __generic_file_splice_read() refuse to make progress: - in the previous invocation, it has allocated a blank page and inserted it into the radix tree, but never has the chance to start I/O for it: the test of SPLICE_F_NONBLOCK goes before that. - in the retried invocation, the readahead code will neither get out of the cache hit mode, nor will it submit I/O for an already existing page. Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-05-08 08:44:36 +02:00
Lachlan McIlroy	f7c66ce3f7	[XFS] Add lockdep support for XFS SGI-PV: 963965 SGI-Modid: xfs-linux-melb:xfs-kern:28485a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:50:19 +10:00
Lachlan McIlroy	71dfd5a396	[XFS] Fix race in xfs_write() b/w dmapi callout and direct I/O checks. In xfs_write() the iolock is dropped and reacquired in XFS_SEND_DATA() which means that the file could change from not-cached to cached and we need to redo the direct I/O checks. We should also redo the direct I/O checks when the file size changes regardless if O_APPEND is set or not. SGI-PV: 963483 SGI-Modid: xfs-linux-melb:xfs-kern:28440a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:50:12 +10:00
Utako Kusaka	3a02ee1828	[XFS] Get rid of redundant "required" in msg. SGI-PV: 963466 SGI-Modid: xfs-linux-melb:xfs-kern:28416a Signed-off-by: Utako Kusaka <utako@tnes.nec.co.jp> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2007-05-08 13:50:06 +10:00
Tim Shimmin	e6a0e9cdff	[XFS] Export via a function xfs_buftarg_list for use by kdb/xfsidbg. SGI-PV: 963465 SGI-Modid: xfs-linux-melb:xfs-kern:28414a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2007-05-08 13:49:59 +10:00
Tim Shimmin	f10bb2dad0	[XFS] Remove unused ilen variable and references. SGI-PV: 907752 SGI-Modid: xfs-linux-melb:xfs-kern:28344a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>	2007-05-08 13:49:53 +10:00
Lachlan McIlroy	ba87ea699e	[XFS] Fix to prevent the notorious 'NULL files' problem after a crash. The problem that has been addressed is that of synchronising updates of the file size with writes that extend a file. Without the fix the update of a file's size, as a result of a write beyond eof, is independent of when the cached data is flushed to disk. Often the file size update would be written to the filesystem log before the data is flushed to disk. When a system crashes between these two events and the filesystem log is replayed on mount the file's size will be set but since the contents never made it to disk the file is full of holes. If some of the cached data was flushed to disk then it may just be a section of the file at the end that has holes. There are existing fixes to help alleviate this problem, particularly in the case where a file has been truncated, that force cached data to be flushed to disk when the file is closed. If the system crashes while the file(s) are still open then this flushing will never occur. The fix that we have implemented is to introduce a second file size, called the in-memory file size, that represents the current file size as viewed by the user. The existing file size, called the on-disk file size, is the one that get's written to the filesystem log and we only update it when it is safe to do so. When we write to a file beyond eof we only update the in- memory file size in the write operation. Later when the I/O operation, that flushes the cached data to disk completes, an I/O completion routine will update the on-disk file size. The on-disk file size will be updated to the maximum offset of the I/O or to the value of the in-memory file size if the I/O includes eof. SGI-PV: 958522 SGI-Modid: xfs-linux-melb:xfs-kern:28322a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:49:46 +10:00
Lachlan McIlroy	2a32963130	[XFS] Fix race condition in xfs_write(). This change addresses a race in xfs_write() where, for direct I/O, the flags need_i_mutex and need_flush are setup before the iolock is acquired. The logic used to setup the flags may change between setting the flags and acquiring the iolock resulting in these flags having incorrect values. For example, if a file is not currently cached then need_i_mutex is set to zero and then if the file is cached before the iolock is acquired we will fail to do the flushinval before the direct write. The flush (and also the call to xfs_zero_eof()) need to be done with the iolock held exclusive so we need to acquire the iolock before checking for cached data (or if the write begins after eof) to prevent this state from changing. For direct I/O I've chosen to always acquire the iolock in shared mode initially and if there is a need to promote it then drop it and reacquire it. There's also some other tidy-ups including removing the O_APPEND offset adjustment since that work is done in generic_write_checks() (and we don't use offset as an input parameter anywhere). SGI-PV: 962170 SGI-Modid: xfs-linux-melb:xfs-kern:28319a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:49:39 +10:00
Kouta Ooizumi	e6d29426bc	[XFS] Fix uquota and oquota enforcement problems. When uquota and oquota (gquota/pquota) are enabled for accounting both are enforced if ether has enforcement active. Conditions: - Both XFS_UQUOTA_ACCT and XFS_GQUOTA_ACCT are enabled. - Either XFS_UQUOTA_ENFD or XFS_OQUOTA_ENFD is enabled. - The usage without enforce is reached at the soft limit. Problems: 1. "repquota" shows all grace time even if no enforcement. 2. we cannot make a file over a hard limits even if no enforcement. SGI-PV: 962291 SGI-Modid: xfs-linux-melb:xfs-kern:28272a Signed-off-by: Kouta Ooizumi <k-ooizumi@tnes.nec.co.jp> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:49:33 +10:00
Lachlan McIlroy	d3cf209476	[XFS] propogate return codes from flush routines This patch handles error return values in fs_flush_pages and fs_flushinval_pages. It changes the prototype of fs_flushinval_pages so we can propogate the errors and handle them at higher layers. I also modified xfs_itruncate_start so that it could propogate the error further. SGI-PV: 961990 SGI-Modid: xfs-linux-melb:xfs-kern:28231a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:49:27 +10:00
Donald Douwsma	424ea91ba6	[XFS] Fix quotaon syscall failures for group enforcement requests. xfs_qm_scall_quotaon was incorrectly failing requests to enable group quota enforcement. Fixes logic error in OQUOTA handling. SGI-PV: 961964 SGI-Modid: xfs-linux-melb:xfs-kern:28227a Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:49:15 +10:00
Donald Douwsma	646d5bdab3	[XFS] Invalidate quotacheck when mounting without a quota type. When quotas are mounted or remounted without a particular quota type the quota accounting for that type becomes invalid. Previously we were ignoring this leading to accounting errors. SGI-PV: 961964 SGI-Modid: xfs-linux-melb:xfs-kern:28225a Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Utako Kusaka <utako@tnes.nec.co.jp> Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:49:09 +10:00
Joe Perches	e7a23a9b37	[XFS] reducing the number of random number functions. Patch provided by Joe Perches SGI-PV: 961696 SGI-Modid: xfs-linux-melb:xfs-kern:28209a Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:49:03 +10:00
Eric Sandeen	e9ed9d2240	[XFS] remove more misc. unused args Patch provided by Eric Sandeen. SGI-PV: 961695 SGI-Modid: xfs-linux-melb:xfs-kern:28205a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:48:56 +10:00
Eric Sandeen	ef497f8a1e	[XFS] the "aendp" arg to xfs_dir2_data_freescan is always NULL, remove it. Patch provided by Eric Sandeen. SGI-PV: 961694 SGI-Modid: xfs-linux-melb:xfs-kern:28204a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:48:49 +10:00
Eric Sandeen	1c72bf9003	[XFS] The last argument "lsn" of xfs_trans_commit() is always called with NULL. Patch provided by Eric Sandeen. SGI-PV: 961693 SGI-Modid: xfs-linux-melb:xfs-kern:28199a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:48:42 +10:00
David Woodhouse	1c97964520	[JFFS2] Simplify and clean up jffs2_add_tn_to_tree() some more. Fixing at least a couple more bugs in the process. Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-05-08 00:19:54 +01:00
Linus Torvalds	2d56d3c43c	Merge branch 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux * 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux: gfs2: nfs lock support for gfs2 lockd: add code to handle deferred lock requests lockd: always preallocate block in nlmsvc_lock() lockd: handle test_lock deferrals lockd: pass cookie in nlmsvc_testlock lockd: handle fl_grant callbacks lockd: save lock state on deferral locks: add fl_grant callback for asynchronous lock return nfsd4: Convert NFSv4 to new lock interface locks: add lock cancel command locks: allow {vfs,posix}_lock_file to return conflicting lock locks: factor out generic/filesystem switch from setlock code locks: factor out generic/filesystem switch from test_lock locks: give posix_test_lock same interface as ->lock locks: make ->lock release private data before returning in GETLK case locks: create posix-to-flock helper functions locks: trivial removal of unnecessary parentheses	2007-05-07 12:34:24 -07:00
Linus Torvalds	5cefcab3db	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (34 commits) [GFS2] Uncomment sprintf_symbol calling code [DLM] lowcomms style [GFS2] printk warning fixes [GFS2] Patch to fix mmap of stuffed files [GFS2] use lib/parser for parsing mount options [DLM] Lowcomms nodeid range & initialisation fixes [DLM] Fix dlm_lowcoms_stop hang [DLM] fix mode munging [GFS2] lockdump improvements [GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks [GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) [DLM] fs/dlm/ast.c should #include "ast.h" [DLM] Consolidate transport protocols [DLM] Remove redundant assignment [GFS2] Fix bz 234168 (ignoring rgrp flags) [DLM] change lkid format [DLM] interface for purge (2/2) [DLM] add orphan purging code (1/2) [DLM] split create_message function [GFS2] Set drop_count to 0 (off) by default ...	2007-05-07 12:26:27 -07:00
Bryan Wu	1394f03221	blackfin architecture This adds support for the Analog Devices Blackfin processor architecture, and currently supports the BF533, BF532, BF531, BF537, BF536, BF534, and BF561 (Dual Core) devices, with a variety of development platforms including those avaliable from Analog Devices (BF533-EZKit, BF533-STAMP, BF537-STAMP, BF561-EZKIT), and Bluetechnix! Tinyboards. The Blackfin architecture was jointly developed by Intel and Analog Devices Inc. (ADI) as the Micro Signal Architecture (MSA) core and introduced it in December of 2000. Since then ADI has put this core into its Blackfin processor family of devices. The Blackfin core has the advantages of a clean, orthogonal,RISC-like microprocessor instruction set. It combines a dual-MAC (Multiply/Accumulate), state-of-the-art signal processing engine and single-instruction, multiple-data (SIMD) multimedia capabilities into a single instruction-set architecture. The Blackfin architecture, including the instruction set, is described by the ADSP-BF53x/BF56x Blackfin Processor Programming Reference http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf The Blackfin processor is already supported by major releases of gcc, and there are binary and source rpms/tarballs for many architectures at: http://blackfin.uclinux.org/gf/project/toolchain/frs There is complete documentation, including "getting started" guides available at: http://docs.blackfin.uclinux.org/ which provides links to the sources and patches you will need in order to set up a cross-compiling environment for bfin-linux-uclibc This patch, as well as the other patches (toolchain, distribution, uClibc) are actively supported by Analog Devices Inc, at: http://blackfin.uclinux.org/ We have tested this on LTP, and our test plan (including pass/fails) can be found at: http://docs.blackfin.uclinux.org/doku.php?id=testing_the_linux_kernel [m.kozlowski@tuxland.pl: balance parenthesis in blackfin header files] Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Aubrey Li <aubrey.li@analog.com> Signed-off-by: Jie Zhang <jie.zhang@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:58 -07:00
Akinobu Mita	5bc98594d5	hugetlbfs: add NULL check in hugetlb_zero_setup() If hugetlbfs module_init() fails, hugetlbfs_vfsmount is not initialized and shmget() with SHM_HUGETLB flag will cause NULL pointer dereference. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Christoph Lameter	50953fe9e0	slab allocators: Remove SLAB_DEBUG_INITIAL flag I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Benjamin Herrenschmidt	036e08568c	get_unmapped_area handles MAP_FIXED in hugetlbfs Generic hugetlb_get_unmapped_area() now handles MAP_FIXED by just calling prepare_hugepage_range() Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Christoph Lameter	0a31bd5f2b	KMEM_CACHE(): simplify slab cache creation This patch provides a new macro KMEM_CACHE(<struct>, <flags>) to simplify slab creation. KMEM_CACHE creates a slab with the name of the struct, with the size of the struct and with the alignment of the struct. Additional slab flags may be specified if necessary. Example struct test_slab { int a,b,c; struct list_head; } __cacheline_aligned_in_smp; test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC) will create a new slab named "test_slab" of the size sizeof(struct test_slab) and aligned to the alignment of test slab. If it fails then we panic. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Peter Zijlstra	96018fdacb	mm: optimize acorn partition truncate invalidate_bdev() is superfluous when truncate_inode_pages() is also called. do call invalidate_bh_lrus() though, to avoid stale pointers. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Peter Zijlstra	f9a14399ae	mm: optimize kill_bdev() Remove duplicate work in kill_bdev(). It currently invalidates and then truncates the bdev's mapping. invalidate_mapping_pages() will opportunistically remove pages from the mapping. And truncate_inode_pages() will forcefully remove all pages. The only thing truncate doesn't do is flush the bh lrus. So do that explicitly. This avoids (very unlikely) but possible invalid lookup results if the same bdev is quickly re-issued. It also will prevent extreme kernel latencies which are observed when blockdevs which have a large amount of pagecache are unmounted, by avoiding invalidate_mapping_pages() on that path. invalidate_mapping_pages() has no cond_resched (it can be called under spinlock), whereas truncate_inode_pages() has one. [akpm@linux-foundation.org: restore nrpages==0 optimisation] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Peter Zijlstra	f98393a64c	mm: remove destroy_dirty_buffers from invalidate_bdev() Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't been used in 6 years (so akpm says). find * -name \.[ch] \| xargs grep -l invalidate_bdev \| while read file; do quilt add $file; sed -ie 's/invalidate_bdev($[^,]$,[^)]*)/invalidate_bdev(\1)/g' $file; done Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Christoph Lameter	d85f33855c	Make page->private usable in compound pages If we add a new flag so that we can distinguish between the first page and the tail pages then we can avoid to use page->private in the first page. page->private == page for the first page, so there is no real information in there. Freeing up page->private makes the use of compound pages more transparent. They become more usable like real pages. Right now we have to be careful f.e. if we are going beyond PAGE_SIZE allocations in the slab on i386 because we can then no longer use the private field. This is one of the issues that cause us not to support debugging for page size slabs in SLAB. Having page->private available for SLUB would allow more meta information in the page struct. I can probably avoid the 16 bit ints that I have in there right now. Also if page->private is available then a compound page may be equipped with buffer heads. This may free up the way for filesystems to support larger blocks than page size. We add PageTail as an alias of PageReclaim. Compound pages cannot currently be reclaimed. Because of the alias one needs to check PageCompound first. The RFC for the this approach was discussed at http://marc.info/?t=117574302800001&r=1&w=2 [nacc@us.ibm.com: fix hugetlbfs] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:53 -07:00
David Rientjes	b813e931b4	smaps: add clear_refs file to clear reference Adds /proc/pid/clear_refs. When any non-zero number is written to this file, pte_mkold() and ClearPageReferenced() is called for each pte and its corresponding page, respectively, in that task's VMAs. This file is only writable by the user who owns the task. It is now possible to measure _approximately_ how much memory a task is using by clearing the reference bits with echo 1 > /proc/pid/clear_refs and checking the reference count for each VMA from the /proc/pid/smaps output at a measured time interval. For example, to observe the approximate change in memory footprint for a task, write a script that clears the references (echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and extracts the size in kB. Add the sizes for each VMA together for the total referenced footprint. Moments later, repeat the process and observe the difference. For example, using an efficient Mozilla: accumulated time referenced memory ---------------- ----------------- 0 s 408 kB 1 s 408 kB 2 s 556 kB 3 s 1028 kB 4 s 872 kB 5 s 1956 kB 6 s 416 kB 7 s 1560 kB 8 s 2336 kB 9 s 1044 kB 10 s 416 kB This is a valuable tool to get an approximate measurement of the memory footprint for a task. Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> [akpm@linux-foundation.org: build fixes] [mpm@selenic.com: rename for_each_pmd] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
David Rientjes	f79f177c25	smaps: add pages referenced count to smaps Adds an additional unsigned long field to struct mem_size_stats called 'referenced'. For each pte walked in the smaps code, this field is incremented by PAGE_SIZE if it has pte-reference bits. An additional line was added to the /proc/pid/smaps output for each VMA to indicate how many pages within it are currently marked as referenced or accessed. Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00

... 7 8 9 10 11 ...

6168 Commits