Commit ca4aa09635 fixed waiting for the
structure to get initialised, but it is also possible to break out of
the loop while still in TASK_INTERRUPTIBLE.
Replace the whole thing by wait_event_interruptible, which is much more
readable, and doesn't suffer from these problems.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NFS_CS_INITING > NFS_CS_READY, so instead of waiting for the structure to
get initialised, we currently immediately jump out of the loop without ever
sleeping.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace references to system_utsname to the per-process uts namespace
where appropriate. This includes things like uname.
Changes: Per Eric Biederman's comments, use the per-process uts namespace
for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c
[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It isn't needed as it is available in rqstp->rq_server, and dropping it allows
some local vars to be dropped.
[akpm@osdl.org: build fix]
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently lockd listens on UDP always, and TCP if CONFIG_NFSD_TCP is set.
However as lockd performs services of the client as well, this is a problem.
If CONFIG_NfSD_TCP is not set, and a tcp mount is used, the server will not be
able to call back to lockd.
So:
- add an option to lockd_up saying which protocol is needed
- Always open sockets for which an explicit port was given, otherwise
only open a socket of the type required
- Change nfsd to do one lockd_up per socket rather than one per thread.
This
- removes the dependancy on CONFIG_NFSD_TCP
- means that lockd may open sockets other than at startup
- means that lockd will *not* listen on UDP if the only
mounts are TCP mount (and nfsd hasn't started).
The latter is the only one that concerns me at all - I don't know if this
might be a problem with some servers.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nfsd has some cleanup that it wants to do when the last thread exits, and
there will shortly be some more. So collect this all into one place and
define a callback for an rpc service to call when the service is about to be
destroyed.
[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some filesystems, instead of simply decrementing i_nlink, simply zero it
during an unlink operation. We need to catch these in addition to the
decrement operations.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When a filesystem decrements i_nlink to zero, it means that a write must be
performed in order to drop the inode from the filesystem.
We're shortly going to have keep filesystems from being remounted r/o between
the time that this i_nlink decrement and that write occurs.
So, add a little helper function to do the decrements. We'll tie into it in a
bit to note when i_nlink hits zero.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove inclusions of linux/mpage.h that are no longer necessary due to the
transfer of generic_writepages().
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.
Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.
[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Rougly half of callers already do it by not checking return value
* Code in drivers/acpi/osl.c does the following to be sure:
(void)kmem_cache_destroy(cache);
* Those who check it printk something, however, slab_error already printed
the name of failed cache.
* XFS BUGs on failed kmem_cache_destroy which is not the decision
low-level filesystem driver should make. Converted to ignore.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some file systems want to manually d_move() the dentries involved in a
rename. We can do this by making use of the FS_ODD_RENAME flag if we just
have nfs_rename() unconditionally do the d_move(). While there, we rename
the flag to be more descriptive.
OCFS2 uses this to protect that part of the rename operation with a cluster
lock.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Comments-only change to clarify a detail of the NFS protocol and how it is
implemented in Linux.
Test plan:
None.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
RFC3530 states that the registered port 2049 for the NFS protocol should be
the default configuration in order to allow clients not to use the RPC
binding protocols.
If the mount program sends us a port=0, we therefore substitute port=2049.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Retry a few times before we give up: the error is usually due to ordering
issues with asynchronous RPC calls.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
And slight optimisation of nfs_end_data_update(): directories never have
delegations anyway.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, a read() request will return EIO even if the file has been
deleted on the server, simply because that is what the VM will return
if the call to readpage() fails to update the page.
Ensure that readpage() marks the inode as stale if it receives an ESTALE.
Then return that error to userland.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the open intents tell us that a given lookup is going to result in a,
exclusive create, we currently optimize away the lookup call itself. The
reason is that the lookup would not be atomic with the create RPC call, so
why do it in the first place?
A problem occurs, however, if the VFS aborts the exclusive create operation
after the lookup, but before the call to create the file/directory: in this
case we will end up with a hashed negative dentry in the dcache that has
never been looked up.
Fix this by only actually hashing the dentry once the create operation has
been successfully completed.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix an oops when the referral server is not responding.
Check the error return from nfs4_set_client() in nfs4_create_referral_server.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Teach NFS_ROOT to use the new rpc_create API instead of the old two-call
API for creating an RPC transport.
Test plan:
Compile the kernel with the NFS client build-in, and set CONFIG_NFS_ROOT.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix up warnings from compiling on ppc64.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that we have a copy of the symlink path in the page cache, we can pass
a struct page down to the XDR routines instead of a string buffer.
Test plan:
Connectathon, all NFS versions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently the NFS client does not cache symlinks it creates. They get
cached only when the NFS client reads them back from the server.
Copy the symlink into the page cache before sending it.
Test plan:
Connectathon, all NFS versions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the LOOKUP or GETATTR in nfs_instantiate fail, nfs_instantiate will do a
d_drop before returning. But some callers already do a d_drop in the case
of an error return. Make certain we do only one d_drop in all error paths.
This issue was introduced because over time, the symlink proc API diverged
slightly from the create/mkdir/mknod proc API. To prevent other coding
mistakes of this type, change the symlink proc API to be more like
create/mkdir/mknod and move the nfs_instantiate call into the symlink proc
routines so it is used in exactly the same way for create, mkdir, mknod,
and symlink.
Test plan:
Connectathon, all versions of NFS.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In the early days of NFS, there was no duplicate reply cache on the server.
Thus retransmitted non-idempotent requests often found that the request had
already completed on the server. To avoid passing an unanticipated return
code to unsuspecting applications, NFS clients would often shunt error
codes that implied the request had been retried but already completed.
Thanks to NFS over TCP, duplicate reply caches on the server, and network
performance and reliability improvements, it is safe to remove such checks.
Test plan:
None.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Convert NFS client mount logic to use rpc_create() instead of the old
xprt_create_proto/rpc_create_client API.
Test plan:
Mount stress tests.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
include/linux/sunrpc/clnt.h already includes include/linux/sunrpc/xprt.h.
We can remove xprt.h from source files that already include clnt.h.
Likewise include/linux/sunrpc/timer.h.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Invoke security_d_instantiate() on root dentries after allocating them with
dentry_alloc_anon(). Normally dentry_alloc_root() would do that, but we don't
call that as we don't want to assign a name to the root dentry at this point
(we may discover the real name later).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix an error handling problem: nfs_put_client() can be given a NULL pointer if
nfs_free_server() is asked to destroy a partially initialised record.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make two new proc files available:
/proc/fs/nfsfs/servers
/proc/fs/nfsfs/volumes
The first lists the servers with which we are currently dealing (struct
nfs_client), and the second lists the volumes we have on those servers (struct
nfs_server).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The attached patch makes NFS share superblocks between mounts from the same
server and FSID over the same protocol.
It does this by creating each superblock with a false root and returning the
real root dentry in the vfsmount presented by get_sb(). The root dentry set
starts off as an anonymous dentry if we don't already have the dentry for its
inode, otherwise it simply returns the dentry we already have.
We may thus end up with several trees of dentries in the superblock, and if at
some later point one of anonymous tree roots is discovered by normal filesystem
activity to be located in another tree within the superblock, the anonymous
root is named and materialises attached to the second tree at the appropriate
point.
Why do it this way? Why not pass an extra argument to the mount() syscall to
indicate the subpath and then pathwalk from the server root to the desired
directory? You can't guarantee this will work for two reasons:
(1) The root and intervening nodes may not be accessible to the client.
With NFS2 and NFS3, for instance, mountd is called on the server to get
the filehandle for the tip of a path. mountd won't give us handles for
anything we don't have permission to access, and so we can't set up NFS
inodes for such nodes, and so can't easily set up dentries (we'd have to
have ghost inodes or something).
With this patch we don't actually create dentries until we get handles
from the server that we can use to set up their inodes, and we don't
actually bind them into the tree until we know for sure where they go.
(2) Inaccessible symbolic links.
If we're asked to mount two exports from the server, eg:
mount warthog:/warthog/aaa/xxx /mmm
mount warthog:/warthog/bbb/yyy /nnn
We may not be able to access anything nearer the root than xxx and yyy,
but we may find out later that /mmm/www/yyy, say, is actually the same
directory as the one mounted on /nnn. What we might then find out, for
example, is that /warthog/bbb was actually a symbolic link to
/warthog/aaa/xxx/www, but we can't actually determine that by talking to
the server until /warthog is made available by NFS.
This would lead to having constructed an errneous dentry tree which we
can't easily fix. We can end up with a dentry marked as a directory when
it should actually be a symlink, or we could end up with an apparently
hardlinked directory.
With this patch we need not make assumptions about the type of a dentry
for which we can't retrieve information, nor need we assume we know its
place in the grand scheme of things until we actually see that place.
This patch reduces the possibility of aliasing in the inode and page caches for
inodes that may be accessed by more than one NFS export. It also reduces the
number of superblocks required for NFS where there are many NFS exports being
used from a server (home directory server + autofs for example).
This in turn makes it simpler to do local caching of network filesystems, as it
can then be guaranteed that there won't be links from multiple inodes in
separate superblocks to the same cache file.
Obviously, cache aliasing between different levels of NFS protocol could still
be a problem, but at least that gives us another key to use when indexing the
cache.
This patch makes the following changes:
(1) The server record construction/destruction has been abstracted out into
its own set of functions to make things easier to get right. These have
been moved into fs/nfs/client.c.
All the code in fs/nfs/client.c has to do with the management of
connections to servers, and doesn't touch superblocks in any way; the
remaining code in fs/nfs/super.c has to do with VFS superblock management.
(2) The sequence of events undertaken by NFS mount is now reordered:
(a) A volume representation (struct nfs_server) is allocated.
(b) A server representation (struct nfs_client) is acquired. This may be
allocated or shared, and is keyed on server address, port and NFS
version.
(c) If allocated, the client representation is initialised. The state
member variable of nfs_client is used to prevent a race during
initialisation from two mounts.
(d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find
the root filehandle for the mount (fs/nfs/getroot.c). For NFS2/3 we
are given the root FH in advance.
(e) The volume FSID is probed for on the root FH.
(f) The volume representation is initialised from the FSINFO record
retrieved on the root FH.
(g) sget() is called to acquire a superblock. This may be allocated or
shared, keyed on client pointer and FSID.
(h) If allocated, the superblock is initialised.
(i) If the superblock is shared, then the new nfs_server record is
discarded.
(j) The root dentry for this mount is looked up from the root FH.
(k) The root dentry for this mount is assigned to the vfsmount.
(3) nfs_readdir_lookup() creates dentries for each of the entries readdir()
returns; this function now attaches disconnected trees from alternate
roots that happen to be discovered attached to a directory being read (in
the same way nfs_lookup() is made to do for lookup ops).
The new d_materialise_unique() function is now used to do this, thus
permitting the whole thing to be done under one set of locks, and thus
avoiding any race between mount and lookup operations on the same
directory.
(4) The client management code uses a new debug facility: NFSDBG_CLIENT which
is set by echoing 1024 to /proc/net/sunrpc/nfs_debug.
(5) Clone mounts are now called xdev mounts.
(6) Use the dentry passed to the statfs() op as the handle for retrieving fs
statistics rather than the root dentry of the superblock (which is now a
dummy).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Start rpciod in the server common (nfs_client struct) management code rather
than in the superblock management code. This means we only need to "start" it
once per server instead of once per superblock.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Eliminate nfs_server::client_sys in favour of nfs_client::cl_rpcclient as we
only really need one per server that we're talking to since it doesn't have any
security on it.
The retransmission management variables are also moved to the common struct as
they're required to set up the cl_rpcclient connection.
The NFS2/3 client and client_acl connections are thenceforth derived by cloning
the cl_rpcclient connection and post-applying the authorisation flavour.
The code for setting up the initial common connection has been moved to
client.c as nfs_create_rpc_client(). All the NFS program definition tables are
also moved there as that's where they're now required rather than super.c.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move the rpc_ops from the nfs_server struct to the nfs_client struct as they're
common to all server records of a particular NFS protocol version.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make better use of inode* dereferencing macros to hide dereferencing chains
(including NFS_PROTO and NFS_CLIENT).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Maintain a common server record for NFS2/3 as well as for NFS4 so that common
stuff can be moved there from struct nfs_server.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add some extra const qualifiers into NFS.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use the nominated dentry's superblock directly in the NFS statfs() op to get a
file handle, rather than using s_root (which will become a dummy dentry in a
future patch).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Generalise the nfs_client structure by:
(1) Moving nfs_client to a more general place (nfs_fs_sb.h).
(2) Renaming its maintenance routines to be non-NFS4 specific.
(3) Move those maintenance routines to a new non-NFS4 specific file (client.c)
and move the declarations to internal.h.
(4) Make nfs_find/get_client() take a full sockaddr_in to include the port
number (will be required for NFS2/3).
(5) Make nfs_find/get_client() take the NFS protocol version (again will be
required to differentiate NFS2, 3 & 4 client records).
Also:
(6) Make nfs_client construction proceed akin to inodes, marking them as under
construction and providing a function to indicate completion.
(7) Make nfs_get_client() wait interruptibly if it finds a client that it can
share, but that client is currently being constructed.
(8) Make nfs4_create_client() use (6) and (7) instead of locking cl_sem.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a set_capabilities NFS RPC op so that the server capabilities can be set.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a lookup filehandle NFS RPC op so that a file handle can be looked up
without requiring dentries and inodes and other VFS stuff when doing an NFS4
pathwalk during mounting.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Return an error when starting the idmapping pipe so that we can detect it
failing.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Rename nfs_server::nfs4_state to nfs_client as it will be used to represent the
client state for NFS2 and NFS3 also.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Rename struct nfs4_client to struct nfs_client so that it can become the basis
for a general client record for NFS2 and NFS3 in addition to NFS4.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make the nfs_callback_up()/down() prototypes just do nothing if NFS4 is not
enabled. Also make the down function void type since we can't really do
anything if it fails.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Rename the NFS4 version of nfs_stat_to_errno() so that it doesn't conflict with
the common one used by NFS2 and NFS3.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix ups for the splitting of the superblock stuff out of fs/nfs/inode.c,
including:
(*) Move the callback tcpport module param into callback.c.
(*) Move the idmap cache timeout module param into idmap.c.
(*) Changes to internal.h:
(*) namespace-nfs4.c was renamed to nfs4namespace.c.
(*) nfs_stat_to_errno() is in nfs2xdr.c, not nfs4xdr.c.
(*) nfs4xdr.c is contingent on CONFIG_NFS_V4.
(*) nfs4_path() is only uses if CONFIG_NFS_V4 is set.
Plus also:
(*) The sec_flavours[] table should really be const.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A pinned inode may in theory end up filling memory with cached ACCESS
calls. This patch ensures that the VM may shrink away the cache in these
particular cases.
The shrinker works by iterating through the list of inodes on the global
nfs_access_lru_list, and removing the least recently used access
cache entry until it is done (or until the entire cache is empty).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The current access cache only allows one entry at a time to be cached for each
inode. Add a per-inode red-black tree in order to allow more than one to
be cached at a time.
Should significantly cut down the time spent in path traversal for shared
directories such as ${PATH}, /usr/share, etc.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The logic in nfs_direct_read_schedule and nfs_direct_write_schedule can
allow data->npages to be one larger than rpages. This causes a page
pointer to be written beyond the end of the pagevec in nfs_read_data (or
nfs_write_data).
Fix this by making nfs_(read|write)_alloc() calculate the size of the
pagevec array, and initialise data->npages.
Also get rid of the redundant argument to nfs_commit_alloc().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is needed in order to handle any NFS4ERR_DELAY errors that might be
returned by the server. It also ensures that we map the NFSv4 errors before
they are returned to userland.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 71c12b3f0abc7501f6ed231a6d17bc9c05a238dc commit)
Check the bounds of length specifiers more thoroughly in the XDR decoding of
NFS4 readdir reply data.
Currently, if the server returns a bitmap or attr length that causes the
current decode point pointer to wrap, this could go undetected (consider a
small "negative" length on a 32-bit machine).
Also add a check into the main XDR decode handler to make sure that the amount
of data is a multiple of four bytes (as specified by RFC-1014). This makes
sure that we can do u32* pointer subtraction in the NFS client without risking
an undefined result (the result is undefined if the pointers are not correctly
aligned with respect to one another).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 5861fddd64a7eaf7e8b1a9997455a24e7f688092 commit)
The problem is that we may be caching writes that would extend the file and
create a hole in the region that we are reading. In this case, we need to
detect the eof from the server, ensure that we zero out the pages that
are part of the hole and mark them as up to date.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 856b603b01b99146918c093969b6cb1b1b0f1c01 commit)
rpc_unlink() and rpc_rmdir() will dput the dentry reference for you.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from a05a57effa71a1f67ccbfc52335c10c8b85f3f6a commit)
nfs_wb_page() waits on request completion and, as a result, is not safe to be
called from nfs_release_page() invoked by VM scanner as part of GFP_NOFS
allocation. Fix possible deadlock by analyzing gfp mask and refusing to
release page if __GFP_FS is not set.
Signed-off-by: Nikita Danilov <danilov@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 374d969debfb290bafcb41d28918dc6f7e43ce31 commit)
nfs_writedata_free() and nfs_readdata_free() can now become static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 5e1ce40f0c3c8f67591aff17756930d7a18ceb1a commit)
In one of the error paths of nfs_path, it may return with dcache_lock still
held; fix this by adding and using a new error path Elong_unlock which unlocks
dcache_lock.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from f4b90b43677fb23297c56802c3056fc304f988d9 commit)
In the case when compiling via a symlink tree, we want to ensure that the
close-to-open GETATTR call is applied only to the final file, and not to
the symlink.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The introduction of the FLUSH_INVALIDATE argument to nfs_sync_inode_wait()
does not clear the nr_unstable page state counter for pages that are being
released.
Also fix a longstanding similar bug when nfs_commit_list() fails.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use FL_ACCESS flag to test and/or wait for local locks before we try
requesting a lock from the server
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use the new behaviour of {flock,posix}_file_lock(F_UNLCK) to determine if
we held a lock, and only send the RPC request to the server if this was the
case.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This fixes a bug in fs/nfs which makes it impossible to build nfs
without having procfs enabled.
Signed-off-by: Dominik Hackl <dominik@hackl.dhs.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Conversion of nr_unstable to a per zone counter
We need to do some special modifications to the nfs code since there are
multiple cases of disposition and we need to have a page ref for proper
accounting.
This converts the last critical page state of the VM and therefore we need to
remove several functions that were depending on GET_PAGE_STATE_LAST in order
to make the kernel compile again. We are only left with event type counters
in page state.
[akpm@osdl.org: bugfixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes nr_dirty a per zone counter. Looping over all processors is
avoided during writeback state determination.
The counter aggregation for nr_dirty had to be undone in the NFS layer since
we summed up the page counts from multiple zones. Someone more familiar with
NFS should probably review what I have done.
[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts ccf01ef7aa commit.
No idea how git managed this one: when I asked it to merge the odirect
topic branch it actually generated a patch which reverted the change.
Reverting the 'merge' will once again reveal Chuck's recent NFS/O_DIRECT
work to the world.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Builds on ARM report link problems with common configurations like
statically linked NFS (for nfsroot). The symptom is that __init
section code references __exit section code; that won't work since
the exit sections are discarded (since they can never be called).
The best fix for these particular cases would be an "__init_or_exit"
section annotation.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trond had apparently merged the same patch twice, causing a duplicate
include of the "internal.h" file, with resulting obvious confusion.
Tssk. I'm the only one allowed to send out trees that don't even
compile! Who does this Trond guy think he is?
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix various problems with nfs4 disabled. And various other things.
In file included from fs/nfs/inode.c:50:
fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration
include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here
fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list
fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want
fs/nfs/internal.h: In function 'nfs4_path':
fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path'
fs/nfs/inode.c: In function 'init_once':
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'open_states'
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'delegation'
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'delegation_state'
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'rwsem'
distcc[26452] ERROR: compile fs/nfs/inode.c on g5/64 failed
make[1]: *** [fs/nfs/inode.o] Error 1
make: *** [fs/nfs/inode.o] Error 2
make: *** Waiting for unfinished jobs....
In file included from fs/nfs/nfs3xdr.c:26:
fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration
include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here
fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list
fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want
fs/nfs/internal.h: In function 'nfs4_path':
fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path'
distcc[26486] ERROR: compile fs/nfs/nfs3xdr.c on g5/64 failed
make[1]: *** [fs/nfs/nfs3xdr.o] Error 1
make: *** [fs/nfs/nfs3xdr.o] Error 2
In file included from fs/nfs/nfs3proc.c:24:
fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration
include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here
fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list
fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want
fs/nfs/internal.h: In function 'nfs4_path':
fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path'
distcc[26469] ERROR: compile fs/nfs/nfs3proc.c on bix/32 failed
make[1]: *** [fs/nfs/nfs3proc.o] Error 1
make: *** [fs/nfs/nfs3proc.o] Error 2
**FAILED**
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Andy Adamson <andros@citi.umich.edu>
Cc: Chuck Lever <cel@netapp.com>
Cc: David Howells <dhowells@redhat.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Manoj Naik <manoj@almaden.ibm.com>
Cc: Marc Eshel <eshel@almaden.ibm.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Re-arrange the logic in the NFS direct I/O path so that nfs_read/write_data
structs are allocated just before they are scheduled, rather than
allocating them all at once before we start scheduling requests.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Neil Brown observed that the kmalloc() in nfs_get_user_pages() is more
likely to fail if the I/O is large enough to require the allocation of more
than a single page to keep track of all the pinned pages in the user's
buffer.
Instead of tracking one large page array per dreq/iocb, track pages per
nfs_read/write_data, just like the cached I/O path does. An array for
pages is already allocated for us by nfs_readdata_alloc() (and the write
and commit equivalents).
This is also required for adding support for vectored I/O to the NFS direct
I/O path.
The original reason to pin the user buffer and allocate all the NFS data
structures before trying to schedule I/O was to ensure all needed resources
are allocated on the client before starting to send requests. This reduces
the chance that resource exhaustion on the client will cause a short read
or write.
On the other hand, for an application making very large application I/O
requests, this means that it will be nearly impossible for the application
to make forward progress on a resource-limited client.
Thus, moving the buffer pinning functionality into the I/O scheduling
loops should be good for scalability. The next patch will do the same for
NFS data structure allocation.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean-up and fix a minor bug: the logic was dirtying page cache pages on
both read and write operations.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make the user_addr, user_count, and pos parameters explicit to the
scheduler routines, and remove the fields from nfs_direct_req. The
iovec API will be passing in a series of these, not just one set.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>