linux

Author	SHA1	Message	Date
Benny Halevy	3a10c30acc	nfs: obliterate NFS_FLAGS macro use NFS_I(inode)->flags instead Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:11 -05:00
Chuck Lever	fc6014771b	NFS: Address memory leaks in the NFS client mount option parser David Howells noticed that repeating the same mount option twice during an NFS mount request can result in orphaned memory in certain cases. Only the client_address and mount_server.hostname strings are initialized in the mount parsing loop, so those appear to be the only two pointers that might be written over by repeating a mount option. The strings in the nfs_server section of the nfs_parsed_mount_data structure are set only once after the options are parsed, thus these are not susceptible to being overwritten. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:11 -05:00
J. Bruce Fields	3d1c550874	nfs4: allow nfsv4 acls on non-regular-files The rfc doesn't give any reason it shouldn't be possible to set an attribute on a non-regular file. And if the server supports it, then it shouldn't be up to us to prevent it. Thanks to Erez for the report and Trond for further analysis. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Tested-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:10 -05:00
Trond Myklebust	f3c391e89c	NFS: Optimise away the sigmask code in aio/dio reads and writes There are no interruptible waits for asynchronous RPC tasks, so we don't need to wrap calls to rpc_run_task() with an rpc_clnt_sigmask/rpc_clnt_unsigmask pair. Instead we can wrap the wait_for_completion_interruptible() in nfs_direct_wait(). This means that we completely optimise away sigmask setting for the case of non-blocking aio/dio. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:10 -05:00
Trond Myklebust	65fdf7d264	NLM: Fix a bogus 'return' in nlmclnt_rpc_release Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:08 -05:00
Chuck Lever	883bb163f8	NLM: Introduce an arguments structure for nlmclnt_init() Clean up: pass 5 arguments to nlmclnt_init() in a structure similar to the new nfs_client_initdata structure. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2008-01-30 02:06:07 -05:00
Chuck Lever	1093a60ef3	NLM/NFS: Use cached nlm_host when calling nlmclnt_proc() Now that each NFS mount point caches its own nlm_host structure, it can be passed to nlmclnt_proc() for each lock request. By pinning an nlm_host for each mount point, we trade the overhead of looking up or creating a fresh nlm_host struct during every NLM procedure call for a little extra memory. We also restrict the nlmclnt_proc symbol to limit the use of this call to in-tree modules. Note that nlm_lookup_host() (just removed from the client's per-request NLM processing) could also trigger an nlm_host garbage collection. Now client-side nlm_host garbage collection occurs only during NFS mount processing. Since the NFS client now holds a reference on these nlm_host structures, they wouldn't have been affected by garbage collection anyway. Given that nlm_lookup_host() reorders the global nlm_host chain after every successful lookup, and that a garbage collection could be triggered during the call, we've removed a significant amount of per-NLM-request CPU processing overhead. Sidebar: there are only a few remaining references to the internals of NFS inodes in the client-side NLM code. The only references I found are related to extracting or comparing the inode's file handle via NFS_FH(). One is in nlmclnt_grant(); the other is in nlmclnt_setlockargs(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:07 -05:00
Chuck Lever	9289e7f91a	NFS: Invoke nlmclnt_init during NFS mount processing Cache an appropriate nlm_host structure in the NFS client's mount point metadata for later use. Note that there is no need to set NFS_MOUNT_NONLM in the error case -- if nfs_start_lockd() returns a non-zero value, its callers ensure that the mount request fails outright. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:07 -05:00
Chuck Lever	52c4044d00	NLM: Introduce external nlm_host set-up and tear-down functions We would like to remove the per-lock-operation nlm_lookup_host() call from nlmclnt_proc(). The new architecture pins an nlm_host structure to each NFS client superblock that has the "lock" mount option set. The NFS client passes in the pinned nlm_host structure during each call to nlmclnt_proc(). NFS client unmount processing "puts" the nlm_host so it can be garbage- collected later. This patch introduces externally callable NLM functions that handle mount-time nlm_host set up and tear-down. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:06 -05:00
Chuck Lever	cab6fc1b77	lockd: Eliminate harmless mixed sign comparison in nlmdbg_cookie2a() The cookie->len field is unsigned, so the loop index variable in nlmdbg_cookie2a() should also be unsigned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:02 -05:00
Chuck Lever	3d509e5454	NFS: nfs_write_end clean up Clean up: commit `4899f9c8` added nfs_write_end(), which introduces a conditional expression that returns an unsigned integer in one arm and a signed integer in the other. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:02 -05:00
Chuck Lever	bf4285e75c	NFS: Fix minor mixed sign comparison in NFS client's write logic Clean up: PAGE_CACHE_SIZE is unsigned, and nfs_pageio_init() takes a size_t. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:01 -05:00
Chuck Lever	d24aae41b4	NFS: Use size_t for storing name lengths Clean up: always use the same type when handling buffer lengths. As a bonus, this prevents a mixed sign comparison in idmap_lookup_name. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:01 -05:00
Chuck Lever	a661b77fc1	NFS: Fix use of copy_to_user() in idmap_pipe_upcall The idmap_pipe_upcall() function expects the copy_to_user() function to return a negative error value if the call fails, but copy_to_user() returns an unsigned long number of bytes that couldn't be copied. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:01 -05:00
Chuck Lever	369af0f116	NFS: Clean up fs/nfs/idmap.c Clean up white space damage and use standard kernel coding conventions for return statements. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:00 -05:00
Trond Myklebust	59dca3b28c	NFS: Fix the 'proto=' mount option Currently, if you have a server mounted using networking protocol, you cannot specify a different value using the 'proto=' option on another mountpoint. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:06:00 -05:00
Trond Myklebust	331702337f	NFS: Support per-mountpoint timeout parameters. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:59 -05:00
Trond Myklebust	7a3e3e18e4	NFS: Ensure that we respect NFS_MAX_TCP_TIMEOUT It isn't sufficient just to limit timeout->to_initval, we also need to limit to_maxval. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:59 -05:00
Trond Myklebust	69dd716c5f	NFSv4: Add socket proto argument to setclientid Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:58 -05:00
Chuck Lever	3c7c7e4812	NFS: Pull covers off IPv6 address parsing Now that the needed IPv6 infrastructure is in place, allow the NFS client's IP address parser to generate AF_INET6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:57 -05:00
Chuck Lever	4c56801770	NFS: Support non-IPv4 addresses in nfs_parsed_mount_data Replace the nfs_server and mount_server address fields in the nfs_parsed_mount_data structure with a "struct sockaddr_storage" instead of a "struct sockaddr_in". Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:57 -05:00
Chuck Lever	9412b92772	NFS: Refactor mount option address parsing into separate function Refactor the logic to parse incoming text-based IP addresses. Use the in4_pton() function instead of the older in_aton(), following the lead of the in-kernel CIFS client. Later we'll add IPv6 address parsing using the matching in6_pton() function. For now we can't allow IPv6 address parsing: we must expand the size of the address storage fields in the nfs_parsed_mount_options struct before we can parse and store IPv6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:56 -05:00
Chuck Lever	338320345b	NFS: Remove the NIPQUAD from nfs_try_mount In the name of address family compatibility, we can't have the NIP_FMT and NIPQUAD macros in nfs_try_mount(). Instead, we can make use of an unused mount option to display the mount server's hostname. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:56 -05:00
Chuck Lever	6677d09513	NFS: Adjust nfs_clone_mount structure to store "struct sockaddr " Change the addr field in the nfs_clone_mount structure to store a "struct sockaddr " to support non-IPv4 addresses in the NFS client. Note this is mostly a cosmetic change, and does not actually allow referrals using IPv6 addresses. The existing referral code assumes that the server returns a string that represents an IPv4 address. This code needs to support hostnames and IPv6 addresses as well as IPv4 addresses, thus it will need to be reorganized completely (to handle DNS resolution in user space). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:56 -05:00
Chuck Lever	dcecae0ff4	NFS: Change nfs4_set_client() to accept struct sockaddr * Adjust the arguments and callers of nfs4_set_client() to pass a "struct sockaddr " instead of a "struct sockaddr_in " to support non-IPv4 addresses in the NFS client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:56 -05:00
Chuck Lever	d7422c472b	NFS: Change nfs_get_client() to take sockaddr * Adjust arguments and callers of nfs_get_client() to pass a "struct sockaddr " instead of "struct sockaddr_in " to support non-IPv4 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:55 -05:00
Chuck Lever	ff052645c9	NFS: Change nfs_find_client() to take "struct sockaddr " Adjust arguments and callers of nfs_find_client() to pass a "struct sockaddr " instead of "struct sockaddr_in *" to support non-IPv4 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Trond: Also fix up protocol version number argument in nfs_find_client() to use the correct u32 type. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:55 -05:00
Chuck Lever	c1d3586656	NFS: Change cb_recallargs to pass "struct sockaddr " instead of sockaddr_in Change the addr field in the cb_recallargs struct to a "struct sockaddr " to support non-IPv4 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:55 -05:00
Chuck Lever	671beed7e2	NFS: Change cb_getattrargs to pass "struct sockaddr " instead of sockaddr_in Change the addr field in the cb_getattrargs struct to a "struct sockaddr " to support non-IPv4 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:54 -05:00
Chuck Lever	6e4cffd7b2	NFS: Expand server address storage in nfs_client struct Prepare for managing larger addresses in the NFS client by widening the nfs_client struct's cl_addr field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> (Modified to work with the new parameters for nfs_alloc_client) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:54 -05:00
Trond Myklebust	3b0d3f93d0	NFS: Add support for AF_INET6 addresses in __nfs_find_client() Introduce AF_INET6-specific address checking to __nfs_find_client(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:54 -05:00
Chuck Lever	0d0f0c192d	NFS: Set default port for NFSv4, with support for AF_INET6 Create a helper function to set the default NFS port for NFSv4 mount points. The helper supports both AF_INET and AF_INET6 family addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:54 -05:00
Chuck Lever	04dcd6e3ac	NFS: Make setting a port number agostic We'll need to set the port number of an AF_INET or AF_INET6 address in several places in fs/nfs/super.c, so introduce a helper that can manage this for us. We put this helper to immediate use. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:53 -05:00
Chuck Lever	cdcd7f9abc	NFS: Verify IPv6 addresses properly Add support to nfs_verify_server_address for recognizing AF_INET6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:53 -05:00
Chuck Lever	fd00a8ff8e	NFS: Add support for AF_INET6 addresses in nfs_compare_super() Refactor nfs_compare_super() and add AF_INET6 support. Replace the generic memcmp() to document explicitly what parts of the addresses must match in this check, and make the comparison independent of the lengths of both addresses. A side benefit is both tests are more computationally efficient than a memcmp(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:53 -05:00
Chuck Lever	3f43c6667a	NFS: Address a couple of nits in nfs_follow_referral() Clean up: fix an outdated block comment, and address a comparison between a signed and unsigned integer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:52 -05:00
Chuck Lever	1d98fe6717	NFS: Move dprintks from callback.c to callback_proc.c Clean up: The client side peer address is available in callback_proc.c, so move a dprintk out of fs/nfs/callback.c and into fs/nfs/callback_proc.c. This is more consistent with other debugging messages, and the proc routines have more information about each request to display. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:52 -05:00
Chuck Lever	5d8515caeb	NFS: eliminate NIPQUAD(clp->cl_addr.sin_addr) To ensure the NFS client displays IPv6 addresses properly, replace address family-specific NIPQUAD() invocations with a call to the RPC client to get a formatted string representing the remote peer's address. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:52 -05:00
Chuck Lever	d4d3c50749	NFS: Enable NFS client to generate CLIENTID strings with IPv6 addresses We recently added methods to RPC transports that provide string versions of the remote peer address information. Convert the NFSv4 SETCLIENTID procedure to use those methods instead of building the client ID out of whole cloth. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:51 -05:00
Chuck Lever	cc38bac3a0	NFS: Ensure NFSv4 SETCLIENTID send buffer is large enough Ensure that the RPC buffer size specified for NFSv4 SETCLIENTID procedures matches what we are encoding into the buffer. See the definition of struct nfs4_setclientid {} and the encode_setclientid() function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:51 -05:00
Trond Myklebust	40c553193d	NFS: Remove the redundant nfs_client->cl_nfsversion We can get the same information from the rpc_ops structure instead. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:49 -05:00
Trond Myklebust	c81468a1a7	NFS: Clean up the nfs_find_client function. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:48 -05:00
Trond Myklebust	3a498026ee	NFS: Clean up the nfs_client initialisation Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:48 -05:00
Trond Myklebust	bfc69a4566	NFS: define a function to update nfsi->cache_change_attribute Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:47 -05:00
Chuck Lever	5cce428d95	NFS: Remove an unneeded check in decode_compound_header_arg() Clean up: The header tag length is unsigned, so checking that it is less than zero is unnecessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:46 -05:00
Chuck Lever	d45273ed6f	NFS: Clean up address comparison in __nfs_find_client() The address comparison in the __nfs_find_client() function is deceptive. It uses a memcmp() to check a pair of u32 fields for equality. Not only is this inefficient, but usually memcmp() is used for comparing two whole sockaddr_in's (which includes comparisons of the address family and port number), so it's easy to mistake the comparison here for a whole sockaddr comparison, which it isn't. So for clarity and efficiency, we replace the memcmp() with a simple test for equality between the two s_addr fields. This should have no behavioral effect. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:46 -05:00
Chuck Lever	6a0ed1de8e	NFS: Clean up: copy hostname with kstrndup during mount processing Clean up: mount option parsing uses kstrndup in several places, rather than using kzalloc. Replace the few remaining uses of kzalloc with kstrndup, for consistency. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:46 -05:00
Chuck Lever	e887cbcf91	NFS: Remove support for the 'mountprog' option Remove the mount option that allows users to specify an alternate mountd program number. The client hasn't support setting an alternate mountd program number for a very long time. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:46 -05:00
Chuck Lever	ad879cef85	NFS: Remove support for the 'nfsprog' option Remove the mount option that allows users to specify an alternate NFS program number. The client hasn't support setting an alternate NFS program number for a very long time. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:45 -05:00
Chuck Lever	0eb2574121	NFS: Ensure that NFS version 4 mounts use NFS_PORT if nfsport wasn't set Text-based mount option parsing introduced a minor regression in the behavior of NFS version 4 mounts. NFS version 4 is not supposed to require a running rpcbind service on the server in order for a mount to succeed. In other words, if the mount options don't specify a port number, the port number is supposed to default to 2049. For earlier versions of NFS, the default port number was zero in order to cause the RPC client to autobind to the server's NFS service. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:45 -05:00
Chuck Lever	28c494c5c8	NFS: Prevent nfs_getattr() hang during heavy write workloads POSIX requires that ctime and mtime, as reported by the stat(2) call, reflect the activity of the most recent write(2). To that end, nfs_getattr() flushes pending dirty writes to a file before doing a GETATTR to allow the NFS server to set the file's size, ctime, and mtime properly. However, nfs_getattr() can be starved when a constant stream of application writes to a file prevents nfs_wb_nocommit() from completing. This usually results in hangs of programs doing a stat against an NFS file that is being written. "ls -l" is a common victim of this behavior. To prevent starvation, hold the file's i_mutex in nfs_getattr() to freeze applications writes temporarily so the client can more quickly obtain clean values for a file's size, mtime, and ctime. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:45 -05:00
Chuck Lever	464ad6b1ad	NFS: Change sign of some loop indices in nfs4xdr.c Nit: Eliminate some mixed sign comparisons in loop indices. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:44 -05:00
Chuck Lever	bcecff77a9	NFS: Use unsigned intermediates for manipulating header lengths (NFSv4 XDR) Clean up: prevent length underflow and mixed sign comparison when unmarshalling NFS version 4 getacl, readdir, and readlink replies. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:44 -05:00
Chuck Lever	c957c526ef	NFS: Use unsigned intermediates for manipulating header lengths (NFSv3 XDR) Clean up: prevent length underflow and mixed sign comparisons when unmarshalling NFS version 3 read, readdir, and readlink replies. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:44 -05:00
Chuck Lever	6232dbbcff	NFS: Use unsigned intermediates for manipulating header lengths (NFSv2 XDR) Clean up: prevent length underflow and mixed sign comparisons when unmarshalling NFS version 2 read, readdir, and readlink replies. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:44 -05:00
Chuck Lever	8a8c74bf94	NFS: Ensure nfs_wcc_update_inode always converts file size to loff_t The nfs_wcc_update_inode() function omits logic to convert the type of the NFS on-the-wire value of a file's size (__u64) to the type of file size value stored in struct inode (loff_t, which is signed). Everywhere else in the NFS client I checked already correctly converts the file size type. This effects only very large files. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:43 -05:00
Trond Myklebust	0773769191	NFS/SUNRPC: Convert users of rpc_init_task+rpc_execute to rpc_run_task() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:39 -05:00
Trond Myklebust	5138fde011	NFS/SUNRPC: Convert all users of rpc_call_setup() Replace use of rpc_call_setup() with rpc_init_task(), and in cases where we need to initialise task->tk_action, with rpc_call_start(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:32 -05:00
Trond Myklebust	bdc7f021f3	NFS: Clean up the (commit\|read\|write)_setup() callback routines Move the common code for setting up the nfs_write_data and nfs_read_data structures into fs/nfs/read.c, fs/nfs/write.c and fs/nfs/direct.c. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:32 -05:00
Trond Myklebust	3ff7576dda	SUNRPC: Clean up the initialisation of priority queue scheduling info. We want the default scheduling priority (priority == 0) to remain RPC_PRIORITY_NORMAL. Also ensure that the priority wait queue scheduling is per process id instead of sometimes being per thread, and sometimes being per inode. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:30 -05:00
Trond Myklebust	c970aa85e7	SUNRPC: Clean up rpc_run_task Make it use the new task initialiser structure instead of acting as a wrapper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:30 -05:00
Trond Myklebust	84115e1cd4	SUNRPC: Cleanup of rpc_task initialisation Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:30 -05:00
Steve Dickson	ef818a28fa	NFS: Stop sillyname renames and unmounts from racing Added an active/deactive mechanism to the nfs_server structure allowing async operations to hold off umount until the operations are done. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:24 -05:00
Trond Myklebust	2f74c0a056	NFSv4: Clean up the OPEN/CLOSE serialisation code Reduce the time spent locking the rpc_sequence structure by queuing the nfs_seqid only when we are ready to take the lock (when calling nfs_wait_on_sequence). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:24 -05:00
Trond Myklebust	acee478afc	NFS: Clean up the write request locking. Ensure that we set/clear NFS_PAGE_TAG_LOCKED when the nfs_page is hashed. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:24 -05:00
Trond Myklebust	8b1f9ee56e	NFS: Optimise nfs_vm_page_mkwrite() The current model locks the page twice for no good reason. Optimise by inlining the parts of nfs_write_begin()/nfs_write_end() that we care about. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:23 -05:00
Trond Myklebust	77f111929d	NFS: Ensure that we eject stale inodes as soon as possible Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:22 -05:00
Trond Myklebust	d45b9d8baf	NFS: Handle -ENOENT errors in unlink()/rmdir()/rename() If the server returns an ENOENT error, we still need to do a d_delete() in order to ensure that the dentry is deleted. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:22 -05:00
Trond Myklebust	609005c319	NFS: Sillyrename: in the case of a race, check aliases are really positive In nfs_do_call_unlink() we check that we haven't raced, and that lookup() hasn't created an aliased dentry to our sillydeleted dentry. If somebody has deleted the file on the server and the lookup() resulted in a negative dentry, then ignore... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:22 -05:00
Trond Myklebust	fccca7fc6a	NFS: Fix a sillyrename race... Ensure that readdir revalidates its data cache after blocking on sillyrename. Also fix a typo in nfs_do_call_unlink(): swap the ^= for an \|=. The result is the same, since we've already checked that the flag is unset, but it makes the code more readable. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-01-30 02:05:21 -05:00
Patrick Caulfeld	39bd4177dd	dlm: close othercons This patch addresses a problem introduced with the last round of lowcomms patches where the 'othercon' connections do not get freed when the DLM shuts down. This results in the error message "slab error in kmem_cache_destroy(): cache `dlm_conn': Can't free all objects" and the DLM cannot be restarted without a system reboot. See bz#428119 Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-29 17:17:32 -06:00
David Teigland	52bda2b5ba	dlm: use dlm prefix on alloc and free functions The dlm functions in memory.c should use the dlm_ prefix. Also, use kzalloc/kfree directly for dlm_direntry's, removing the wrapper functions. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-29 17:17:19 -06:00
David Teigland	11b2498ba7	dlm: don't print common non-errors Change log_error() to log_debug() for conditions that can occur in large number in normal operation. Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-29 17:17:08 -06:00
Adrian Bunk	e028398da7	dlm: proper prototypes This patch adds a proper prototype for some functions in fs/dlm/dlm_internal.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-29 17:16:52 -06:00
Lon Hohberger	6bd8fedaa1	dlm: bind connections from known local address when using TCP A common problem occurs when multiple IP addresses within the same subnet are assigned to the same NIC. If we make a connection attempt to another address on the same subnet as one of those addresses, the connection attempt will not necessarily be routed from the address we want. In the case of the DLM, the other nodes will quickly drop the connection attempt, causing problems. This patch makes the DLM bind to the local address it acquired from the cluster manager when using TCP prior to making a connection, obviating the need for administrators to "fix" their systems or use clever routing tricks. Signed-off-by: Lon Hohberger <lhh@redhat.com> Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2008-01-29 16:44:25 -06:00
Jens Axboe	9e97198dbf	splice: fix problem with atime not being updated A bug report on nfsd that states that since it was switched to use splice instead of sendfile, the atime was no longer being updated on the input file. do_generic_mapping_read() does this when accessing the file, make splice do it for the direct splice handler. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-29 21:55:20 +01:00
Linus Torvalds	0ba6c33bcd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits) [IPV6] ADDRLABEL: Fix double free on label deletion. [PPP]: Sparse warning fixes. [IPV4] fib_trie: remove unneeded NULL check [IPV4] fib_trie: More whitespace cleanup. [NET_SCHED]: Use nla_policy for attribute validation in ematches [NET_SCHED]: Use nla_policy for attribute validation in actions [NET_SCHED]: Use nla_policy for attribute validation in classifiers [NET_SCHED]: Use nla_policy for attribute validation in packet schedulers [NET_SCHED]: sch_api: introduce constant for rate table size [NET_SCHED]: Use typeful attribute parsing helpers [NET_SCHED]: Use typeful attribute construction helpers [NET_SCHED]: Use NLA_PUT_STRING for string dumping [NET_SCHED]: Use nla_nest_start/nla_nest_end [NET_SCHED]: Propagate nla_parse return value [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get [NET_SCHED]: act_api: use nlmsg_parse [NET_SCHED]: act_api: fix netlink API conversion bug [NET_SCHED]: sch_netem: use nla_parse_nested_compat [NET_SCHED]: sch_atm: fix format string warning [NETNS]: Add namespace for ICMP replying code. ...	2008-01-29 22:54:01 +11:00
Linus Torvalds	5ea293a904	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (79 commits) Remove references to "make dep" kconfig: document use of HAVE_* Introduce new section reference annotations tags: __ref, __refdata, __refconst kbuild: warn about ld added unique sections kbuild: add verbose option to Section mismatch reporting in modpost kconfig: tristate choices with mixed tristate and boolean values asm-generic/vmlix.lds.h: simplify __mem{init,exit}* dependencies remove __attribute_used__ kbuild: support ARCH=x86 in buildtar kconfig: remove "enable" kbuild: simplified warning report in modpost kbuild: introduce a few helpers in modpost kbuild: use simpler section mismatch warnings in modpost kbuild: link vmlinux.o before kallsyms passes kbuild: introduce new option to enhance section mismatch analysis Use separate sections for __dev/__cpu/__mem code/data compiler.h: introduce __section() all archs: consolidate init and exit sections in vmlinux.lds.h kbuild: check section names consistently in modpost kbuild: introduce blacklisting in modpost ...	2008-01-29 22:46:14 +11:00
Linus Torvalds	8cd226ca3f	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits) jbd2: sparse pointer use of zero as null jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup jbd2: Mark jbd2 slabs as SLAB_TEMPORARY jbd2: add lockdep support ext4: Use the ext4_ext_actual_len() helper function ext4: fix uniniatilized extent splitting error ext4: Check for return value from sb_set_blocksize ext4: Add stripe= option to /proc/mounts ext4: Enable the multiblock allocator by default ext4: Add multi block allocator for ext4 ext4: Add new functions for searching extent tree ext4: Add ext4_find_next_bit() ext4: fix up EXT4FS_DEBUG builds ext4: Fix ext4_show_options to show the correct mount options. ext4: Add EXT4_IOC_MIGRATE ioctl ext4: Add inode version support in ext4 vfs: Add 64 bit i_version support ext4: Add the journal checksum feature jbd2: jbd2 stats through procfs ext4: Take read lock during overwrite case. ...	2008-01-29 22:43:38 +11:00
Mingming Cao	4019191be7	jbd2: sparse pointer use of zero as null Get rid of sparse related warnings from places that use integer as NULL pointer. (Ported from upstream ext3/jbd changes.) Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Mingming Cao	db857da336	jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup While "every 5 seconds" doesn't sound as a problem, there can be many of these (and these timers do add up over all the kernel). The "5 second" wakeup isn't really timing sensitive; in addition even with rounding it'll still happen every 5 seconds (with the exception of the very first time, which is likely to be rounded up to somewhere closer to 6 seconds) (Ported from similar JBD patch made by Arjan van de Ven to fs/jbd/transaction.c) Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Mingming Cao	77160957e2	jbd2: Mark jbd2 slabs as SLAB_TEMPORARY This patch marks slab allocations by jbd2 as short-lived in support of Mel Gorman's "Group short-lived and reclaimable kernel allocations" patch. (Ported from similar changes made to fs/jbd/journal.c and fs/jbd/revoke.c in Mel's patch.) Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Mingming Cao	7b7510662f	jbd2: add lockdep support Ported from similar patch for the jbd layer. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	b939e3766e	ext4: Use the ext4_ext_actual_len() helper function ext4 uses the high bit of the extent length to encode whether the extent is intialized or not. The helper function ext4_ext_get_actual_len should be used to get the actual length of the extent. This addresses the kernel bug documented here: http://bugzilla.kernel.org/show_bug.cgi?id=9732 kernel BUG at fs/ext4/extents.c:1056! .... Call Trace: [<ffffffff88366073>] :ext4dev:ext4_ext_get_blocks+0x5ba/0x8c1 [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff812748f6>] _spin_unlock+0x17/0x20 [<ffffffff883400a6>] :jbd2:start_this_handle+0x4e0/0x4fe [<ffffffff88366564>] :ext4dev:ext4_fallocate+0x175/0x39a [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff81056480>] __lock_acquire+0x4e7/0xc4d [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff810a8de7>] sys_fallocate+0xe4/0x10d [<ffffffff8100c043>] tracesys+0xd5/0xda Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Dmitry Monakhov	dbf9d7da33	ext4: fix uniniatilized extent splitting error Fix bug reported by Dmitry Monakhov caused by lost error code Testcase: blksize = 0x1000; fd = open(argv[1], O_RDWR\|O_CREAT, 0700); unsigned long long sz = 0x10000000UL; /* allocating big blocks chunk / syscall(__NR_fallocate, fd, 0, 0UL, sz) / grab all other available filesystem space / tfd = open("tmp", O_RDWR\|O_CREAT\|O_DIRECT, 0700); while( write(tfd, buf, 4096) > 0); / loop untill ENOSPC / fsync(fd); / just in case / while (pos < sz) { / each seek+ write operation result in splits uninitialized extent in three extents. Splitting may result in new extent allocation which probably will fail because of ENOSPC/ lseek(fd, blksize2 -1, SEEK_CUR); if ((ret = write(fd, 'a', 1)) != 1) exit(1); pos += blksize * 2; } Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	ce40733ce9	ext4: Check for return value from sb_set_blocksize sb_set_blocksize validates whether the specfied block size can be used by the file system. Make sure we fail mounting the file system if the blocksize specfied cannot be used. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Miklos Szeredi	cb45bbe44b	ext4: Add stripe= option to /proc/mounts Add stripe= option to /proc/mounts for ext4 filesystems. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	3dbd0ede4d	ext4: Enable the multiblock allocator by default Enable the multiblock allocator by default. Fix ext4_show_options() so if it is not enabled, the nomballoc option included in /proc/mounts. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:26 -05:00
Alex Tomas	c9de560ded	ext4: Add multi block allocator for ext4 Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-29 00:19:52 -05:00
Alex Tomas	1988b51e47	ext4: Add new functions for searching extent tree Add the functions ext4_ext_search_left() and ext4_ext_search_right(), which are used by mballoc during ext4_ext_get_blocks to decided whether to merge extent information. Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Johann Lombardi <johann@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Eric Sandeen	c549a95d40	ext4: fix up EXT4FS_DEBUG builds Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail without these changes. Clean up some format warnings too. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	aa22df2cc8	ext4: Fix ext4_show_options to show the correct mount options. We need to look at the default value and make sure the mount options are not set via default value before showing them via ext4_show_options Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	c14c6fd5c5	ext4: Add EXT4_IOC_MIGRATE ioctl The below patch add ioctl for migrating ext3 indirect block mapped inode to ext4 extent mapped inode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Jean Noel Cordenner	25ec56b518	ext4: Add inode version support in ext4 This patch adds 64-bit inode version support to ext4. The lower 32 bits are stored in the osd1.linux1.l_i_version field while the high 32 bits are stored in the i_version_hi field newly created in the ext4_inode. This field is incremented in case the ext4_inode is large enough. A i_version mount option has been added to enable the feature. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net>	2008-01-28 23:58:27 -05:00
Jean Noel Cordenner	7a224228ed	vfs: Add 64 bit i_version support The i_version field of the inode is changed to be a 64-bit counter that is set on every inode creation and that is incremented every time the inode data is modified (similarly to the "ctime" time-stamp). The aim is to fulfill a NFSv4 requirement for rfc3530. This first part concerns the vfs, it converts the 32-bit i_version in the generic inode to a 64-bit, a flag is added in the super block in order to check if the feature is enabled and the i_version is incremented in the vfs. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>	2008-01-28 23:58:27 -05:00
Girish Shilamkar	818d276ceb	ext4: Add the journal checksum feature The journal checksum feature adds two new flags i.e JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the checksum for the blocks described by the descriptor blocks. Due to checksums, writing of the commit record no longer needs to be synchronous. Now commit record can be sent to disk without waiting for descriptor blocks to be written to disk. This behavior is controlled using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be able to recover the journal with _ASYNC_COMMIT hence it is made incompat. The commit header has been extended to hold the checksum along with the type of the checksum. For recovery in pass scan checksums are verified to ensure the sanity and completeness(in case of _ASYNC_COMMIT) of every transaction. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Girish Shilamkar <girish@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Johann Lombardi	8e85fb3f30	jbd2: jbd2 stats through procfs The patch below updates the jbd stats patch to 2.6.20/jbd2. The initial patch was posted by Alex Tomas in December 2005 (http://marc.info/?l=linux-ext4&m=113538565128617&w=2). It provides statistics via procfs such as transaction lifetime and size. Sometimes, investigating performance problems, i find useful to have stats from jbd about transaction's lifetime, size, etc. here is a patch for review and inclusion probably. for example, stats after creation of 3M files in htree directory: [root@bob ~]# cat /proc/fs/jbd/sda/history R/C tid wait run lock flush log hndls block inlog ctime write drop close R 261 8260 2720 0 0 750 9892 8170 8187 C 259 750 0 4885 1 R 262 20 2200 10 0 770 9836 8170 8187 R 263 30 2200 10 0 3070 9812 8170 8187 R 264 0 5000 10 0 1340 0 0 0 C 261 8240 3212 4957 0 R 265 8260 1470 0 0 4640 9854 8170 8187 R 266 0 5000 10 0 1460 0 0 0 C 262 8210 2989 4868 0 R 267 8230 1490 10 0 4440 9875 8171 8188 R 268 0 5000 10 0 1260 0 0 0 C 263 7710 2937 4908 0 R 269 7730 1470 10 0 3330 9841 8170 8187 R 270 0 5000 10 0 830 0 0 0 C 265 8140 3234 4898 0 C 267 720 0 4849 1 R 271 8630 2740 20 0 740 9819 8170 8187 C 269 800 0 4214 1 R 272 40 2170 10 0 830 9716 8170 8187 R 273 40 2280 0 0 3530 9799 8170 8187 R 274 0 5000 10 0 990 0 0 0 where, R - line for transaction's life from T_RUNNING to T_FINISHED C - line for transaction's checkpointing tid - transaction's id wait - for how long we were waiting for new transaction to start (the longest period journal_start() took in this transaction) run - real transaction's lifetime (from T_RUNNING to T_LOCKED lock - how long we were waiting for all handles to close (time the transaction was in T_LOCKED) flush - how long it took to flush all data (data=ordered) log - how long it took to write the transaction to the log hndls - how many handles got to the transaction block - how many blocks got to the transaction inlog - how many blocks are written to the log (block + descriptors) ctime - how long it took to checkpoint the transaction write - how many blocks have been written during checkpointing drop - how many blocks have been dropped during checkpointing close - how many running transactions have been closed to checkpoint this one all times are in msec. [root@bob ~]# cat /proc/fs/jbd/sda/info 280 transaction, each upto 8192 blocks average: 1633ms waiting for transaction 3616ms running transaction 5ms transaction was being locked 1ms flushing data (in ordered mode) 1799ms logging transaction 11781 handles per transaction 5629 blocks per transaction 5641 logged blocks per transaction Signed-off-by: Johann Lombardi <johann.lombardi@bull.net> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	4df3d265bf	ext4: Take read lock during overwrite case. When we are overwriting a file and not actually allocating new file system blocks we need to take only the read lock on i_data_sem. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:29 -05:00
Aneesh Kumar K.V	0e855ac8b1	ext4: Convert truncate_mutex to read write semaphore. We are currently taking the truncate_mutex for every read. This would have performance impact on large CPU configuration. Convert the lock to read write semaphore and take read lock when we are trying to read the file. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Aneesh Kumar K.V	c278bfeceb	ext4: Make ext4_get_blocks_wrap take the truncate_mutex early. When doing a migrate from ext3 to ext4 inode we need to make sure the test for inode type and walking inode data happens inside lock. To make this happen move truncate_mutex early before checking the i_flags. This actually should enable us to remove the verify_chain(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Mariusz Kozlowski	01f4adc044	ext4: remove unused code from ext4_find_entry() The unused code found in ext3_find_entry() is also present (and still unused) in the ext4_find_entry() code. This patch removes it. Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	221879c927	ext4: Check for the correct error return from ext4_ext_get_blocks returns negative values on error. We should check for <= 0 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Jan Kara	f5a7a6b0d9	jbd2: Fix assertion failure in fs/jbd2/checkpoint.c Before we start committing a transaction, we call __journal_clean_checkpoint_list() to cleanup transaction's written-back buffers. If this call happens to remove all of them (and there were already some buffers), __journal_remove_checkpoint() will decide to free the transaction because it isn't (yet) a committing transaction and soon we fail some assertion - the transaction really isn't ready to be freed :). We change the check in __journal_remove_checkpoint() to free only a transaction in T_FINISHED state. The locking there is subtle though (as everywhere in JBD ;(). We use j_list_lock to protect the check and a subsequent call to __journal_drop_transaction() and do the same in the end of journal_commit_transaction() which is the only place where a transaction can get to T_FINISHED state. Probably I'm too paranoid here and such locking is not really necessary - checkpoint lists are processed only from log_do_checkpoint() where a transaction must be already committed to be processed or from __journal_clean_checkpoint_list() where kjournald itself calls it and thus transaction cannot change state either. Better be safe if something changes in future... Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	abcb2947c9	ext4: add block bitmap validation When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given corresponding to block bitmap, inode bitmap and inode tables. Validate the block bitmap against these blocks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	389d1b083c	Add buffer head related helper functions Add buffer head related helper function bh_uptodate_or_lock and bh_submit_read which can be used by file system Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Aneesh Kumar K.V	bb4f397a1a	ext4: Change the default behaviour on error ext4 file system was by default ignoring errors and continuing. This is not a good default as continuing on error could lead to file system corruption. Change the default to mark the file system readonly. Debian and ubuntu already does this as the default in their fstab. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:26 -05:00
Eric Sandeen	e7c9559300	ext4: fix oops on corrupted ext4 mount When mounting an ext4 filesystem with corrupted s_first_data_block, things can go very wrong and oops. Because blocks_count in ext4_fill_super is a u64, and we must use do_div, the calculation of db_count is done differently than on ext4. If first_data_block is corrupted such that it is larger than ext4_blocks_count, for example, then the intermediate blocks_count value may go negative, but sign-extend to a very large value: blocks_count = (ext4_blocks_count(es) - le32_to_cpu(es->s_first_data_block) + EXT4_BLOCKS_PER_GROUP(sb) - 1); This is then assigned to s_groups_count which is an unsigned long: sbi->s_groups_count = blocks_count; This may result in a value of 0xFFFFFFFF which is then used to compute db_count: db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) / EXT4_DESC_PER_BLOCK(sb); and in this case db_count will wind up as 0 because the addition overflows 32 bits. This in turn causes the kmalloc for group_desc to be of 0 size: sbi->s_group_desc = kmalloc(db_count * sizeof (struct buffer_head *), GFP_KERNEL); and eventually in ext4_check_descriptors, dereferencing sbi->s_group_desc[desc_block] will result in a NULL pointer dereference. The simplest test seems to be to sanity check s_first_data_block, EXT4_BLOCKS_PER_GROUP, and ext4_blocks_count values to be sure their combination won't result in a bad intermediate value for blocks_count. We could just check for db_count == 0, but catching it at the root cause seems like it provides more info. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Adrian Bunk	07620f69ef	ext4/super.c: fix #ifdef's (CONFIG_EXT4_* -> CONFIG_EXT4DEV_*) Based on a report by Robert P. J. Day. Signed-off-by: Adrian Bunk <bunk@kernel.org>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	cb47dce791	ext4: Return after ext4_error in case of failures This fix some instances where we were continuing after calling ext4_error. ext4_error call panic only if errors=panic mount option is set. So we need to make sure we return correctly after ext4_error call Reported by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	fe7fdc37b5	ext3: Fix the max file size for ext3 file system. The max file size for ext3 file system is now calculated with hardcoded 4K block size. The patch fixes it to be calculated with the right block size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Aneesh Kumar K.V	902be4c5ef	ext2: Fix the max file size for ext2 file system. The max file size for ext2 file system is now calculated with hardcoded 4K block size. The patch fixes it to be calculated with the right block size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Eric Sandeen	e2b4657453	ext4: store maxbytes for bitmapped files and return EFBIG as appropriate Calculate & store the max offset for bitmapped files, and catch too-large seeks, truncates, and writes in ext4, shortening or rejecting as appropriate. Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Eric Sandeen	19295529db	ext4: export iov_shorten from kernel for ext4's use Export iov_shorten() from kernel so that ext4 can truncate too-large writes to bitmapped files. Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Eric Sandeen	cd2291a463	ext4: different maxbytes functions for bitmap & extent files use 2 different maxbytes functions for bitmapped & extent-based files. Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	8180a5627d	ext4: Support large files This patch converts ext4_inode i_blocks to represent total blocks occupied by the inode in file system block size. Earlier the variable used to represent this in 512 byte block size. This actually limited the total size of the file. The feature is enabled transparently when we write an inode whose i_blocks cannot be represnted as 512 byte units in a 48 bit variable. inode flag EXT4_HUGE_FILE_FL Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	0fc1b45147	ext4: Add support for 48 bit inode i_blocks. Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode to represet the higher 16 bits for i_blocks. With this change max_file size becomes (2*48 -1 ) 512 bytes. We add a RO_COMPAT feature to the super block to indicate that inode have i_blocks represented as a split 48 bits. Super block with this feature set cannot be mounted read write on a kernel with CONFIG_LSF disabled. Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:26 -05:00
Aneesh Kumar K.V	a48380f769	ext4: Rename i_dir_acl to i_size_high Rename ext4_inode.i_dir_acl to i_size_high drop ext4_inode_info.i_dir_acl as it is not used Rename ext4_inode.i_size to ext4_inode.i_size_lo Add helper function for accessing the ext4_inode combined i_size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	7973c0c19e	ext4: Rename i_file_acl to i_file_acl_lo Rename i_file_acl to i_file_acl_lo. This helps in finding bugs where we use i_file_acl instead of the combined i_file_acl_lo and i_file_acl_high Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	1d03ec984c	ext4: Fix sparse warnings. Fix sparse warnings related to static functions and local variables. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	99e6f829a8	ext4: Introduce ext4_update__feature Introduce ext4_update__feature and use them instead of opencoding. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Avantika Mathur	2aa9fc4c40	ext4: fixes block group number being set to a negative value This patch fixes various places where the group number is set to a negative value. Signed-off-by: Avantika Mathur <mathur@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Avantika Mathur	fd2d42912f	ext4: add ext4_group_t, and change all group variables to this type. In many places variables for block group are of type int, which limits the maximum number of block groups to 2^31. Each block group can have up to 2^15 blocks, with a 4K block size, and the max filesystem size is limited to 2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB This patch introduces a new type ext4_group_t, of type unsigned long, to represent block group numbers in ext4. All occurrences of block group variables are converted to type ext4_group_t. Signed-off-by: Avantika Mathur <mathur@us.ibm.com>	2008-01-28 23:58:27 -05:00
Eric Sandeen	bba907433b	ext4 extents: remove unneeded casts There are many casts in extents.c which are not needed, as the variables are already the type of the cast, or are being promoted for no particular reason in printk's. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	725d26d3f0	ext4: Introduce ext4_lblk_t This patch adds a new data type ext4_lblk_t to represent the logical file blocks. This is the preparatory patch to support large files in ext4 The follow up patch with convert the ext4_inode i_blocks to represent the number of blocks in file system block size. This changes makes it possible to have a block number 2**32 -1 which will result in overflow if the block number is represented by signed long. This patch convert all the block number to type ext4_lblk_t which is typedef to __u32 Also remove dead code ext4_ext_walk_space Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>	2008-01-28 23:58:27 -05:00
Jan Kara	a72d7f834e	ext4: Avoid rec_len overflow with 64KB block size With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry lenght. So we store 0xffff instead and convert value when read from / written to disk. The patch also converts some places to use ext4_next_entry() when we are changing them anyway. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Takashi Sato	afc7cbca5b	ext4: Support large blocksize up to PAGESIZE This patch set supports large block size(>4k, <=64k) in ext4, just enlarging the block size limit. But it is NOT possible to have 64kB blocksize on ext4 without some changes to the directory handling code. The reason is that an empty 64kB directory block would have a rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in the filesystem. The proposed solution is treat 64k rec_len with a an impossible value like rec_len = 0xffff to handle this. The Patch-set consists of the following 2 patches. [1/2] ext4: enlarge blocksize - Allow blocksize up to pagesize [2/2] ext4: fix rec_len overflow - prevent rec_len from overflow with 64KB blocksize Now on 64k page ppc64 box runs with this patch set we could create a 64k block size ext4dev, and able to handle empty directory block. Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Joel Becker	6b11d8179d	ocfs2: Fix userspace ABI breakage in sysfs The userspace ABI of ocfs2's internal cluster stack (o2cb) was broken by commit `c60b717879` "kset: convert ocfs2 to use kset_create". Specifically, the '/sys/o2cb' kset was moved to '/sys/fs/o2cb'. This breaks all ocfs2 tools and renders the filesystem unmountable. This fix moves '/sys/o2cb' back where it belongs. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-28 19:10:23 -08:00
Denis V. Lunev	b7c6ba6eb1	[NETNS]: Consolidate kernel netlink socket destruction. Create a specific helper for netlink kernel socket disposal. This just let the code look better and provides a ground for proper disposal inside a namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:07 -08:00
Denis V. Lunev	e5d69b9f4a	[ATM]: Oops reading net/atm/arp cat /proc/net/atm/arp causes the NULL pointer dereference in the get_proc_net+0xc/0x3a. This happens as proc_get_net believes that the parent proc dir entry contains struct net. Fix this assumption for "net/atm" case. The problem is introduced by the commit c0097b07abf5f92ab135d024dd41bd2aada1512f from Eric W. Biederman/Daniel Lezcano. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:36 -08:00
Denis V. Lunev	e372c41401	[NET]: Consolidate net namespace related proc files creation. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:28 -08:00
Jens Axboe	bbdfc2f706	[SPLICE]: Don't assume regular pages in splice_to_pipe() Allow caller to pass in a release function, there might be other resources that need releasing as well. Needed for network receive. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:30 -08:00
Adrian Bunk	3ff6eecca4	remove __attribute_used__ Remove the deprecated __attribute_used__. [Introduce __section in a few places to silence checkpatch /sam] Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2008-01-28 23:21:18 +01:00
WANG Cong	7491a76b23	FS: Remove dead code Remove dead code in smbfs makefile. Cc: Al Viro <viro@www.linux.org.uk> Cc: Tim Shimmin <xfs-masters@oss.sgi.com> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2008-01-28 23:14:37 +01:00
Linus Torvalds	8d01eddf29	Merge branch 'for-2.6.25' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.25' of git://git.kernel.dk/linux-2.6-block: block: implement drain buffers __bio_clone: don't calculate hw/phys segment counts block: allow queue dma_alignment of zero blktrace: Add blktrace ioctls to SCSI generic devices	2008-01-29 08:51:56 +11:00
Jens Axboe	0871714e08	cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions Currently you must be root to set idle io prio class on a process. This is due to the fact that the idle class is implemented as a true idle class, meaning that it will not make progress if someone else is requesting disk access. Unfortunately this means that it opens DOS opportunities by locking down file system resources, hence it is root only at the moment. This patch relaxes the idle class a little, by removing the truly idle part (which entals a grace period with associated timer). The modifications make the idle class as close to zero impact as can be done while still guarenteeing progress. This means we can relax the root only criteria as well. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 11:38:15 +01:00
Jens Axboe	d38ecf935f	io context sharing: preliminary support Detach task state from ioc, instead keep track of how many processes are accessing the ioc. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:50:31 +01:00
Jens Axboe	fd0928df98	ioprio: move io priority from task_struct to io_context This is where it belongs and then it doesn't take up space for a process that doesn't do IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:50:29 +01:00
Jens Axboe	5d84070ee0	__bio_clone: don't calculate hw/phys segment counts If the users sets a new ->bi_bdev on the bio after __bio_clone() has returned it, the "segment counts valid" flag still remains even though it may be different with the new target. So don't calculate segment counts in __bio_clone(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:04:46 +01:00
Linus Torvalds	ef3f2de2b5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] DFS build fixes [CIFS] DFS support: provide shrinkable mounts [CIFS] Do not log path names in lookup errors [CIFS] DFS support patchset: Added mountdata [CIFS] Forgot to add two new files from previous commit [CIFS] DNS name resolution helper upcall for cifs [CIFS] fix checkpatch warnings in fs/cifs/inode.c [CIFS] hold ses sem on tcp session reconnect during mount [CIFS] Allow setting mode via cifs acl [CIFS] fix unicode string alignment in SPNEGO setup [CIFS] cifs_partialpagewrite() cleanup [CIFS] use krb5 session key from first SMB session after a NegProt [CIFS] redo existing session setup if needed in cifs_mount [CIFS] Only dump SPNEGO key if CONFIG_CIFS_DEBUG2 is set [CIFS] fix SetEA failure to some Samba versions	2008-01-26 23:01:20 -08:00
Linus Torvalds	9b73e76f3c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits) [SCSI] usbstorage: use last_sector_bug flag universally [SCSI] libsas: abstract STP task status into a function [SCSI] ultrastor: clean up inline asm warnings [SCSI] aic7xxx: fix firmware build [SCSI] aacraid: fib context lock for management ioctls [SCSI] ch: remove forward declarations [SCSI] ch: fix device minor number management bug [SCSI] ch: handle class_device_create failure properly [SCSI] NCR5380: fix section mismatch [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices [SCSI] IB/iSER: add logical unit reset support [SCSI] don't use __GFP_DMA for sense buffers if not required [SCSI] use dynamically allocated sense buffer [SCSI] scsi.h: add macro for enclosure bit of inquiry data [SCSI] sd: add fix for devices with last sector access problems [SCSI] fix pcmcia compile problem [SCSI] aacraid: add Voodoo Lite class of cards. [SCSI] aacraid: add new driver features flags [SCSI] qla2xxx: Update version number to 8.02.00-k7. [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command. ...	2008-01-25 17:19:08 -08:00
Linus Torvalds	29bd17af7d	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (31 commits) ocfs2: clean up bh null checks ocfs2: document access rules for blocked_lock_list configfs: file.c fix possible recursive locking configfs: dir.c fix possible recursive locking configfs: Remove EXPERIMENTAL ocfs2: bump version number ocfs2/dlm: Clear joining_node on hearbeat node down ocfs2: convert byte order of constant instead of variable ocfs2: Update default cluster timeouts ocfs2: printf fixes ocfs2: Use generic_file_llseek ocfs2: Safer read_inline_data() ocfs2: Silence false lockdep warnings [PATCH 2/2] ocfs2: cluster aware flock() [PATCH 1/2] ocfs2: add flock lock type ocfs2: Local alloc window size changeable via mount option ocfs2: Support commit= mount option ocfs2: Add missing permission checks [PATCH 2/2] ocfs2: Implement group add for online resize [PATCH 1/2] ocfs2: Add group extend for online resize ...	2008-01-25 17:11:13 -08:00
Mark Fasheh	2fe5c1d7eb	ocfs2: clean up bh null checks If we know a buffer_head is non-null, then brelse() is unnecessary and put_bh() can be used instead. Also, an explicit check for NULL is unnecessary when using brelse(). This patch only covers buffer_head_io.c and resize.c, which have recently added code which exhibits this problem. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:48 -08:00
Mark Fasheh	7ec373cf33	ocfs2: document access rules for blocked_lock_list ocfs2_super->blocked_lock_list and ocfs2_super->blocked_lock_count have some usage restrictions which aren't immediately obvious to anyone reading the code. It's a good idea to document this so that we avoid making costly mistakes in the future. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:48 -08:00
Joonwoo Park	116ba5d5ea	configfs: file.c fix possible recursive locking configfs_register_subsystem() with default_groups triggers recursive locking. it seems that mutex_lock_nested is needed. ============================================= [ INFO: possible recursive locking detected ] 2.6.24-rc6 #145 --------------------------------------------- swapper/1 is trying to acquire lock: (&sb->s_type->i_mutex_key#3){--..}, at: [<c40c9a9e>] configfs_add_file+0x2e/0x70 but task is already holding lock: (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca985>] configfs_register_subsystem+0x55/0x130 other info that might help us debug this: 1 lock held by swapper/1: #0: (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca985>] configfs_register_subsystem+0x55/0x130 stack backtrace: Pid: 1, comm: swapper Not tainted 2.6.24-rc6 #145 [<c40053ba>] show_trace_log_lvl+0x1a/0x30 [<c4005e82>] show_trace+0x12/0x20 [<c400687e>] dump_stack+0x6e/0x80 [<c404ec72>] __lock_acquire+0xe62/0x1120 [<c404efb2>] lock_acquire+0x82/0xa0 [<c43fda88>] mutex_lock_nested+0x98/0x2e0 [<c40c9a9e>] configfs_add_file+0x2e/0x70 [<c40c9b0c>] configfs_create_file+0x2c/0x40 [<c40ca639>] configfs_attach_item+0x139/0x220 [<c40ca734>] configfs_attach_group+0x14/0x140 [<c40ca7e9>] configfs_attach_group+0xc9/0x140 [<c40ca9f6>] configfs_register_subsystem+0xc6/0x130 [<c45c8186>] init_netconsole+0x2b6/0x300 [<c45a75f2>] kernel_init+0x142/0x320 [<c4004fb3>] kernel_thread_helper+0x7/0x14 ======================= Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:47 -08:00
Joonwoo Park	ba611edfe4	configfs: dir.c fix possible recursive locking configfs_register_subsystem() with default_groups triggers recursive locking. it seems that mutex_lock_nested is needed. ============================================= [ INFO: possible recursive locking detected ] 2.6.24-rc6 #141 --------------------------------------------- swapper/1 is trying to acquire lock: (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca76f>] configfs_attach_group+0x4f/0x190 but task is already holding lock: (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca9d5>] configfs_register_subsystem+0x55/0x130 other info that might help us debug this: 1 lock held by swapper/1: #0: (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca9d5>] configfs_register_subsystem+0x55/0x130 stack backtrace: Pid: 1, comm: swapper Not tainted 2.6.24-rc6 #141 [<c40053ba>] show_trace_log_lvl+0x1a/0x30 [<c4005e82>] show_trace+0x12/0x20 [<c400687e>] dump_stack+0x6e/0x80 [<c404ec72>] __lock_acquire+0xe62/0x1120 [<c404efb2>] lock_acquire+0x82/0xa0 [<c43fdad8>] mutex_lock_nested+0x98/0x2e0 [<c40ca76f>] configfs_attach_group+0x4f/0x190 [<c40caa46>] configfs_register_subsystem+0xc6/0x130 [<c45c8186>] init_netconsole+0x2b6/0x300 [<c45a75f2>] kernel_init+0x142/0x320 [<c4004fb3>] kernel_thread_helper+0x7/0x14 ======================= Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:47 -08:00
Joel Becker	02ac0499c0	configfs: Remove EXPERIMENTAL configfs has been alive and kicking for a while now. It underpins some non-EXPERIMENTAL subsystems, such as OCFS2's cluster stack. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:47 -08:00
Mark Fasheh	0e5ae03203	ocfs2: bump version number Bump the printed version to 1.5.0. This helps us quickly identify which version of Ocfs2 a bug filer is running. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:46 -08:00
Tao Ma	2d4b1cbb44	ocfs2/dlm: Clear joining_node on hearbeat node down Currently the process of dlm join contains 2 steps: query join and assert join. After query join, the joined node will set its joining_node. So if the joining node happens to panic before the 2nd step, the joined node will fail to clear its joining_node flag because that node isn't in the domain map. It at least cause 2 problems. 1. All the new join request will fail. So no new node can mount the volume. 2. The joined node can't umount the volume since during the umount process it has to wait for the joining_node to be unknown. So the umount will be hanged. The solution is to clear the joining_node before we check the domain map. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:46 -08:00
Marcin Slusarz	4092d49f70	ocfs2: convert byte order of constant instead of variable Convert byte order of constant instead of variable it will be done at compile time vs run time. Remove unused le32_and_cpu. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:46 -08:00
Sunil Mushran	17104683d2	ocfs2: Update default cluster timeouts Lots of people are having trouble with the default timeouts, which are too low. These new values are derived from an informal survey taken on ocfs2-users, as well as data from bug reports. This should reduce the amount of cluster disconnects and subsequent fencing seen during normal workloads. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:45 -08:00
Jan Kara	634bf74d1e	ocfs2: printf fixes Explicitely convert loff_t to long long in printf. Just for sure... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:45 -08:00
Jan Kara	32c3c0e2e5	ocfs2: Use generic_file_llseek We should use generic_file_llseek() and not default_llseek() so that s_maxbytes gets properly checked when seeking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:45 -08:00
Jan Kara	d2849fb294	ocfs2: Safer read_inline_data() In ocfs2_read_inline_data() we should store file size in loff_t. Although the file size should fit in 32 bits we cannot be sure in case filesystem is corrupted. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:44 -08:00
Jan Kara	5fa0613ea5	ocfs2: Silence false lockdep warnings Create separate lockdep lock classes for system file's i_mutexes. They are used to guard allocations and similar things and thus rank differently than i_mutex of a regular file or directory. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:44 -08:00
Mark Fasheh	53fc622b9e	[PATCH 2/2] ocfs2: cluster aware flock() Hook up ocfs2_flock(), using the new flock lock type in dlmglue.c. A new mount option, "localflocks" is added so that users can revert to old functionality as need be. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:43 -08:00
Mark Fasheh	cf8e06f1a8	[PATCH 1/2] ocfs2: add flock lock type This adds a new dlmglue lock type which is intended to back flock() requests. Since these locks are driven from userspace, usage rules are much more liberal than the typical Ocfs2 internal cluster lock. As a result, we can't make use of most dlmglue features - lock caching and lock level optimizations in particular. Additionally, userspace is free to deadlock itself, so we have to deal with that in the same way as the rest of the kernel - by allowing a signal to abort a lock request. In order to keep ocfs2_cluster_lock() complexity down, ocfs2_file_lock() does it's own dlm coordination. We still use the same helper functions though, so duplicated code is kept to a minimum. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:43 -08:00
Sunil Mushran	2fbe8d1ebe	ocfs2: Local alloc window size changeable via mount option Local alloc is a performance optimization in ocfs2 in which a node takes a window of bits from the global bitmap and then uses that for all small local allocations. This window size is fixed to 8MB currently. This patch allows users to specify the window size in MB including disabling it by passing in 0. If the number specified is too large, the fs will use the default value of 8MB. mount -o localalloc=X /dev/sdX /mntpoint Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:43 -08:00
Mark Fasheh	d147b3d630	ocfs2: Support commit= mount option Mostly taken from ext3. This allows the user to set the jbd commit interval, in seconds. The default of 5 seconds stays the same, but now users can easily increase the commit interval. Typically, this would be increased in order to benefit performance at the expense of data-safety. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:42 -08:00
Mark Fasheh	0957f00796	ocfs2: Add missing permission checks Check that an online resize is being driven by a user with permission to change system resource limits. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:19 -08:00
Tao Ma	7909f2bf83	[PATCH 2/2] ocfs2: Implement group add for online resize This patch adds the ability for a userspace program to request that a properly formatted cluster group be added to the main allocation bitmap for an Ocfs2 file system. The request is made via an ioctl, OCFS2_IOC_GROUP_ADD. On a high level, this is similar to ext3, but we use a different ioctl as the structure which has to be passed through is different. During an online resize, tunefs.ocfs2 will format any new cluster groups which must be added to complete the resize, and call OCFS2_IOC_GROUP_ADD on each one. Kernel verifies that the core cluster group information is valid and then does the work of linking it into the global allocation bitmap. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:04:24 -08:00
Tao Ma	d659072f73	[PATCH 1/2] ocfs2: Add group extend for online resize This patch adds the ability for a userspace program to request an extend of last cluster group on an Ocfs2 file system. The request is made via ioctl, OCFS2_IOC_GROUP_EXTEND. This is derived from EXT3_IOC_GROUP_EXTEND, but is obviously Ocfs2 specific. tunefs.ocfs2 would call this for an online-resize operation if the last cluster group isn't full. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:53:35 -08:00
Tao Ma	e9d578a8f2	ocfs2: Initalize bitmap_cpg of ocfs2_super to be the maximum. This value is initialized from global_bitmap->id2.i_chain.cl_cpg. If there is only 1 group, it will be equal to the total clusters in the volume. So as for online resize, it should change for all the nodes in the cluster. It isn't easy and there is no corresponding lock for it. bitmap_cpg is only used in 2 areas: 1. Check whether the suballoc is too large for us to allocate from the global bitmap, so it is little used. And now the suballoc size is 2048, it rarely meet this situation and the check is almost useless. 2. Calculate which group a cluster belongs to. We use it during truncate to figure out which cluster group an extent belongs too. But we should be OK if we increase it though as the cluster group calculated shouldn't change and we only ever have a small bitmap_cpg on file systems with a single cluster group. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:48:54 -08:00
Mark Fasheh	1252c434e3	ocfs2: Documentation update Remove 'readpages' from the list in ocfs2.txt. Instead of having two identical lists, I just removed the list in the OCFS2 section of fs/Kconfig and added a pointer to Documentation/filesystems/ocfs2.txt. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:48:42 -08:00
Mark Fasheh	628a24f5bd	ocfs2: Readpages support Add ->readpages support to Ocfs2. This is rather trivial - all it required is a small update to ocfs2_get_block (for mapping full extents via b_size) and an ocfs2_readpages() function which partially mirrors ocfs2_readpage(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:48:12 -08:00
Mark Fasheh	e63aecb651	ocfs2: Rename ocfs2_meta_[un]lock Call this the "inode_lock" now, since it covers both data and meta data. This patch makes no functional changes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:46:01 -08:00
Mark Fasheh	c934a92d05	ocfs2: Remove data locks The meta lock now covers both meta data and data, so this just removes the now-redundant data lock. Combining locks saves us a round of lock mastery per inode and one less lock to ping between nodes during read/write. We don't lose much - since meta locks were always held before a data lock (and at the same level) ordered writeout mode (the default) ensured that flushing for the meta data lock also pushed out data anyways. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:45:57 -08:00
Mark Fasheh	f1f540688e	ocfs2: Add data downconvert worker to inode lock In order to extend inode lock coverage to inode data, we use the same data downconvert worker with only a small modification to only do work for regular files. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:45:54 -08:00
Mark Fasheh	34d024f843	ocfs2: Remove mount/unmount votes The node maps that are set/unset by these votes are no longer relevant, thus we can remove the mount and umount votes. Since those are the last two remaining votes, we can also remove the entire vote infrastructure. The vote thread has been renamed to the downconvert thread, and the small amount of functionality related to managing it has been moved into fs/ocfs2/dlmglue.c. All references to votes have been removed or updated. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:45:34 -08:00
Mark Fasheh	6f7b056ea9	ocfs2: Remove fs dependency on ocfs2_heartbeat module Now that the dlm exposes domain information to us, we don't need generic node up / node down callbacks. And since the DLM is only telling us when a node goes down unexpectedly, we no longer need to optimize away node down callbacks via the umount map. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:36:40 -08:00
Mark Fasheh	6561168cb4	ocfs2_dlm: Call node eviction callbacks from heartbeat handler With this, a dlm client can take advantage of the group protocol in the dlm to get full notification whenever a node within the dlm domain leaves unexpectedly. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:36:40 -08:00
Linus Torvalds	0008bf5440	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (96 commits) sched: keep total / count stats in addition to the max for sched, futex: detach sched.h and futex.h sched: fix: don't take a mutex from interrupt context sched: print backtrace of running tasks too printk: use ktime_get() softlockup: fix signedness sched: latencytop support sched: fix goto retry in pick_next_task_rt() timers: don't #error on higher HZ values sched: monitor clock underflows in /proc/sched_debug sched: fix rq->clock warps on frequency changes sched: fix, always create kernel threads with normal priority debug: clean up kernel/profile.c sched: remove the !PREEMPT_BKL code sched: make PREEMPT_BKL the default debug: track and print last unloaded module in the oops trace debug: show being-loaded/being-unloaded indicator for modules sched: rt-watchdog: fix .rlim_max = RLIM_INFINITY sched: rt-group: reduce rescheduling hrtimer: unlock hrtimer_wakeup ...	2008-01-25 13:42:32 -08:00
Linus Torvalds	2d94dfc8c3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: mount options: fix jfs JFS: simplify types to get rid of sparse warning JFS: FIx one more plain integer as NULL pointer warning JFS: Remove defconfig ptr comparison to 0 JFS: use DIV_ROUND_UP where appropriate Remove unnecessary kmalloc casts in the jfs filesystem JFS is missing a memory barrier JFS: Make sure special inode data is written after journal is flushed JFS: clear PAGECACHE_TAG_DIRTY for no-write pages	2008-01-25 12:20:32 -08:00
Arjan van de Ven	9745512ce7	sched: latencytop support LatencyTOP kernel infrastructure; it measures latencies in the scheduler and tracks it system wide and per process. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:34 +01:00
Paul E. McKenney	e260be673a	Preempt-RCU: implementation This patch implements a new version of RCU which allows its read-side critical sections to be preempted. It uses a set of counter pairs to keep track of the read-side critical sections and flips them when all tasks exit read-side critical section. The details of this implementation can be found in this paper - http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf and the article- http://lwn.net/Articles/253651/ This patch was developed as a part of the -rt kernel development and meant to provide better latencies when read-side critical sections of RCU don't disable preemption. As a consequence of keeping track of RCU readers, the readers have a slight overhead (optimizations in the paper). This implementation co-exists with the "classic" RCU implementations and can be switched to at compiler. Also includes RCU tracing summarized in debugfs. [ akpm@linux-foundation.org: build fixes on non-preempt architectures ] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-01-25 21:08:24 +01:00
Linus Torvalds	b47711bfbc	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: selinux: make mls_compute_sid always polyinstantiate security/selinux: constify function pointer tables and fields security: add a secctx_to_secid() hook security: call security_file_permission from rw_verify_area security: remove security_sb_post_mountroot hook Security: remove security.h include from mm.h Security: remove security_file_mmap hook sparse-warnings (NULL as 0). Security: add get, set, and cloning of superblock security information security/selinux: Add missing "space"	2008-01-25 08:44:29 -08:00
Linus Torvalds	e07dd2ad30	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (56 commits) [GFS2] Allow journal recovery on read-only mount [GFS2] Lockup on error [GFS2] Fix page_mkwrite truncation race path [GFS2] Fix typo [GFS2] Fix write alloc required shortcut calculation [GFS2] gfs2_alloc_required performance [GFS2] Remove unneeded i_spin [GFS2] Reduce inode size by moving i_alloc out of line [GFS2] Fix assert in log code [GFS2] Fix problems relating to execution of files on GFS2 [GFS2] Initialize extent_list earlier [GFS2] Allow page migration for writeback and ordered pages [GFS2] Remove unused variable [GFS2] Fix log block mapper [GFS2] Minor correction [GFS2] Eliminate the no longer needed sd_statfs_mutex [GFS2] Incremental patch to fix compiler warning [GFS2] Function meta_read optimization [GFS2] Only fetch the dinode once in block_map [GFS2] Reorganize function gfs2_glmutex_lock ...	2008-01-25 08:39:18 -08:00
Steve French	366781c196	[CIFS] DFS build fixes Also includes a few minor changes suggested by Christoph Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-01-25 10:12:41 +00:00
Abhijith Das	7bc5c414fe	[GFS2] Allow journal recovery on read-only mount This patch allows gfs2 to perform journal recovery even if it is mounted read-only. Strictly speaking, a read-only mount should not be writing to the filesystem, but we do this only to perform journal recovery. A read-only mount will fail if we don't recover the dirty journal. Also, when gfs2 is used as a root filesystem, it will be mounted read-only before being mounted read-write during the boot sequence. A failed read-only mount will panic the machine during bootup. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:21:22 +00:00
Bob Peterson	1b8177ec1e	[GFS2] Lockup on error I spotted this bug while I was digging around. Looks like it could cause a lockup in some rare error condition. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:21:04 +00:00
Steven Whitehouse	b7fe2e391e	[GFS2] Fix page_mkwrite truncation race path There was a bug in the truncation/invalidation race path for ->page_mkwrite for gfs2. It ought to return 0 so that the effect is the same as if the page was truncated at any of the other points at which the page_lock is dropped. This will result in the restart of the whole page fault path. If it was due to a real truncation (as opposed to an invalidate because we let a glock go) then the ->fault path will pick that up when it gets called again. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:20:15 +00:00
Bob Peterson	3e5cd0877e	[GFS2] Fix typo This patch fixes a minor typo. Surprisingly, it still compiled. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:19:51 +00:00
Steven Whitehouse	1af535727b	[GFS2] Fix write alloc required shortcut calculation The comparison was being made against the wrong quantity. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:19:28 +00:00
Bob Peterson	0522053519	[GFS2] gfs2_alloc_required performance This is a small I/O performance enhancement to gfs2. (Actually, it is a rework of an earlier version I got wrong). The idea here is to check if the write extends past the last block in the file. If so, the function can save itself a lot of time and trouble because it knows an allocate will be required. Benchmarks like iozone should see better performance. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:19:03 +00:00
Bob Peterson	598278bd48	[GFS2] Remove unneeded i_spin This patch removes a vestigial variable "i_spin" from the gfs2_inode structure. This not only saves us memory (>300000 of these in memory for the oom test) it also saves us time because we don't have to spend time initializing it (i.e. slightly better performance). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:18:44 +00:00
Steven Whitehouse	6dbd822487	[GFS2] Reduce inode size by moving i_alloc out of line It is possible to reduce the size of GFS2 inodes by taking the i_alloc structure out of the gfs2_inode. This patch allocates the i_alloc structure whenever its needed, and frees it afterward. This decreases the amount of low memory we use at the expense of requiring a memory allocation for each page or partial page that we write. A quick test with postmark shows that the overhead is not measurable and I also note that OCFS2 use the same approach. In the future I'd like to solve the problem by shrinking down the size of the members of the i_alloc structure, but for now, this reduces the immediate problem of using too much low-memory on x86 and doesn't add too much overhead. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:18:25 +00:00
Steven Whitehouse	ac39aadd04	[GFS2] Fix assert in log code Although the values were all being calculated correctly, there was a race in the assert due to the way it was using atomic variables. This changes the value we assert on so that we get the same effect by testing a different variable. This prevents the assert triggering when it shouldn't. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:18:03 +00:00
Steven Whitehouse	9656b2c14c	[GFS2] Fix problems relating to execution of files on GFS2 This patch fixes a couple of problems which affected the execution of files on GFS2. The first is that there was a corner case where inodes were not always uptodate at the point at which permissions checks were being carried out, this was resulting in refusal of execute permission, but only on the first lookup, subsequent requests worked correctly. The second was a problem relating to incorrect updating of file sizes which was introduced with the write_begin/end code for GFS2 a little while back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>	2008-01-25 08:17:31 +00:00
Bob Peterson	0811a127cb	[GFS2] Initialize extent_list earlier Here is a patch for the latest upstream GFS2 code: The journal extent map needs to be initialized sooner than it currently is. Otherwise failed mount attempts (e.g. not enough journals, etc.) may panic trying to access the uninitialized list. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:17:04 +00:00
Steven Whitehouse	e5d9dc278c	[GFS2] Allow page migration for writeback and ordered pages To improve performance on NUMA, we use the VM's standard page migration for writeback and ordered pages. Probably we could also do the same for journaled data, but that would need a careful audit of the code, so will be the subject of a later patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:16:41 +00:00
Steven Whitehouse	65a6290998	[GFS2] Remove unused variable The go_drop_th function is never called or referenced. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:16:19 +00:00
Steven Whitehouse	ff91cc9bb4	[GFS2] Fix log block mapper A missing offset in the calculation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:15:58 +00:00
Bob Peterson	fa3742fa85	[GFS2] Minor correction This is a small correction to my previously posted patch1. It just changes a divide to a shift. It's faster and doesn't introduce odd dependencies on 32-bit compiles. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:15:37 +00:00
Bob Peterson	c3f60b6e3a	[GFS2] Eliminate the no longer needed sd_statfs_mutex This patch eliminates the unneeded sd_statfs_mutex mutex but preserves the ordering as discussed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:15:16 +00:00
Bob Peterson	b3513fca7e	[GFS2] Incremental patch to fix compiler warning Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:14:53 +00:00
Bob Peterson	15c7cee799	[GFS2] Function meta_read optimization This patch optimizes function gfs2_meta_read. Basically, gfs2_meta_wait was being called regardless of whether a disk read was requested. This just pulls that wait into the if that triggers the read. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:14:33 +00:00
Bob Peterson	b0d5fd3074	[GFS2] Only fetch the dinode once in block_map Function gfs2_block_map was often looking up the disk inode twice. This optimizes it so that only does it once. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:14:13 +00:00
Bob Peterson	398bbe6832	[GFS2] Reorganize function gfs2_glmutex_lock This patch optimizes the function gfs2_glmutex_lock. The basic theory is: Why bother initializing a holder, setting up wait bits and then waiting on them, if you know the glock can be yours. So the holder stuff is placed inside the if checking if the glock is locked. This one needs careful scrutiny because changing anything to do with locking should strike terror into one's heart. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:13:52 +00:00
Bob Peterson	5fdc2eeb5d	[GFS2] Run through full bitmaps quicker in gfs2_bitfit I eliminated the passing of an unused parameter into gfs2_bitfit called rgd. This also changes the gfs2_bitfit code that searches for free (or used) blocks. Before, the code was trying to check for bytes that indicated 4 blocks in the undesired state. The problem is, it was spending more time trying to do this than it actually was saving. This version only optimizes the case where we're looking for free blocks, and it checks a machine word at a time. So on 32-bit machines, it will check 32-bits (16 blocks) and on 64-bit machines, it will check 64-bits (32 blocks) at a time. The compiler optimizes that quite well and we save some time, especially when running through full bitmaps (like the bitmaps allocated for the journals). There's probably a more elegant or optimized way to do this, but I haven't thought of it yet. I'm open to suggestions. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:13:31 +00:00
Bob Peterson	0d0868bde3	[GFS2] Get rid of useless "found" variable in quota.c This just eliminates an unused variable from the quota code. Not likely to be a time saver. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:13:01 +00:00
Bob Peterson	da6dd40d59	[GFS2] Journal extent mapping This patch saves a little time when gfs2 writes to the journals by keeping a mapping between logical and physical blocks on disk. That's better than constantly looking up indirect pointers in buffers, when the journals are several levels of indirection (which they typically are). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:11:46 +00:00

... 2 3 4 5 6 ...

7704 Commits