linux

Author	SHA1	Message	Date
Trond Myklebust	ecac799a5e	NFSv4: Fix the setlk error handler Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:36 -05:00
Trond Myklebust	b4410c2f7f	NFSv4.1: Fix the handling of the SEQUENCE status bits We want SEQUENCE status bits to be handled by the state manager in order to avoid threading issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:35 -05:00
Trond Myklebust	0400a6b0cb	NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses nfs4_schedule_state_recovery() should only be used when we need to force the state manager to check the lease. If we just want to start the state manager in order to handle a state recovery situation, we should be using nfs4_schedule_state_manager(). This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing its use with a set of helper functions that do the right thing. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:22 -05:00
Andy Adamson	c34c32ea97	NFSv4.1 reclaim complete must wait for completion Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix whitespace errors] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:05:01 -05:00
Andy Adamson	114f64b5f2	NFSv4: remove duplicate clientid in struct nfs_client Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:05:00 -05:00
Ricardo Labiaga	7d6d63d642	NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY Fix bug where we currently retry the EXCHANGEID call again, eventhough we already have a valid clientid. Instead, delay and retry the CREATE_SESSION call. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:59 -05:00
Frank Filz	3fa0b4e201	(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid The problem was use of an int32, which when converted to a uint64 is sign extended resulting in a fileid that doesn't fit in 32 bits even though the intent of the function is to fit the fileid into 32 bits. Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> [Trond: Added an include for compat.h] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:58 -05:00
Jovi Zhang	43b7c3f051	nfs: fix compilation warning this commit fix compilation warning as following: linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:56 -05:00
Stanislav Fomichev	b9f810570d	nfs: add kmalloc return value check in decode_and_add_ds add kmalloc return value check in decode_and_add_ds Signed-off-by: Stanislav Fomichev <kernel@fomichev.me> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:55 -05:00
Jeff Layton	d2224e7afb	nfs: close NFSv4 COMMIT vs. CLOSE race I've been adding in more artificial delays in the NFSv4 commit and close codepaths to uncover races. The kernel I'm testing has the patch to close the race in __rpc_wait_for_completion_task that's in Trond's cthon2011 branch. The reproducer I've been using does this in a loop: mkdir("DIR"); fd = open("DIR/FILE", O_WRONLY\|O_CREAT\|O_EXCL, 0644); write(fd, "abcdefg", 7); close(fd); unlink("DIR/FILE"); rmdir("DIR"); The above reproducer shouldn't result in any silly-renaming. However, when I add a "msleep(100)" just after the nfs_commit_clear_lock call in nfs_commit_release, I can almost always force one to occur. If I can force it to occur with that, then it can happen without that delay given the right timing. nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait for the task to complete before putting its reference to it, so the last reference get put in rpc_release task and gets queued to a workqueue. In this situation, the last open context reference may be put by the COMMIT release instead of the close() syscall. The close() syscall returns too quickly and the unlink runs while the d_count is still high since the COMMIT release hasn't put its dentry reference yet. Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete before putting the task reference when FLUSH_SYNC is set. With this, the last reference is put by the process that's initiating the FLUSH_SYNC commit and the race is closed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:53 -05:00
Trond Myklebust	bf294b41ce	SUNRPC: Close a race in __rpc_wait_for_completion_task() Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:52 -05:00
Neil Horman	e9e3d724e2	nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) The "bad_page()" page allocator sanity check was reported recently (call chain as follows): bad_page+0x69/0x91 free_hot_cold_page+0x81/0x144 skb_release_data+0x5f/0x98 __kfree_skb+0x11/0x1a tcp_ack+0x6a3/0x1868 tcp_rcv_established+0x7a6/0x8b9 tcp_v4_do_rcv+0x2a/0x2fa tcp_v4_rcv+0x9a2/0x9f6 do_timer+0x2df/0x52c ip_local_deliver+0x19d/0x263 ip_rcv+0x539/0x57c netif_receive_skb+0x470/0x49f :virtio_net:virtnet_poll+0x46b/0x5c5 net_rx_action+0xac/0x1b3 __do_softirq+0x89/0x133 call_softirq+0x1c/0x28 do_softirq+0x2c/0x7d do_IRQ+0xec/0xf5 default_idle+0x0/0x50 ret_from_intr+0x0/0xa default_idle+0x29/0x50 cpu_idle+0x95/0xb8 start_kernel+0x220/0x225 _sinittext+0x22f/0x236 It occurs because an skb with a fraglist was freed from the tcp retransmit queue when it was acked, but a page on that fraglist had PG_Slab set (indicating it was allocated from the Slab allocator (which means the free path above can't safely free it via put_page. We tracked this back to an nfsv4 setacl operation, in which the nfs code attempted to fill convert the passed in buffer to an array of pages in __nfs4_proc_set_acl, which gets used by the skb->frags list in xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer to a page struct via virt_to_page, but the vfs allocates the buffer via kmalloc, meaning the PG_slab bit is set. We can't create a buffer with kmalloc and free it later in the tcp ack path with put_page, so we need to either: 1) ensure that when we create the list of pages, no page struct has PG_Slab set or 2) not use a page list to send this data Given that these buffers can be multiple pages and arbitrarily sized, I think (1) is the right way to go. I've written the below patch to allocate a page from the buddy allocator directly and copy the data over to it. This ensures that we have a put_page free-able page for every entry that winds up on an skb frag list, so it can be safely freed when the frame is acked. We do a put page on each entry after the rpc_call_sync call so as to drop our own reference count to the page, leaving only the ref count taken by tcp_sendpages. This way the data will be properly freed when the ack comes in Successfully tested by myself to solve the above oops. Note, as this is the result of a setacl operation that exceeded a page of data, I think this amounts to a local DOS triggerable by an uprivlidged user, so I'm CCing security on this as well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Trond Myklebust <Trond.Myklebust@netapp.com> CC: security@kernel.org CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-04 17:28:52 -08:00
Tejun Heo	43d133c18b	Merge branch 'master' into for-2.6.39	2011-02-21 09:43:56 +01:00
Chuck Lever	d1205f87bb	NFS: NFSv4 readdir loses entries On recent 2.6.38-rc kernels, connectathon basic test 6 fails on NFSv4 mounts of OpenSolaris with something like: > ./test6: readdir > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0 > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0 > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0 > ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors > basic tests failed > Tests failed, leaving /mnt/klimt mounted > [cel@matisse cthon04]$ I narrowed the problem down to nfs4_decode_dirent() reporting that the decode buffer had overflowed while decoding the entries for those missing files. verify_attr_len() assumes both it's pointer arguments reside on the same page. When these arguments point to locations on two different pages, verify_attr_len() can report false errors. This can happen now that a large NFSv4 readdir result can span pages. We have reasonably good checking in nfs4_decode_dirent() anyway, so it should be safe to simply remove the extra checking. At a guess, this was introduced by commit `6650239a`, "NFS: Don't use vm_map_ram() in readdir". Cc: stable@kernel.org [2.6.37] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-28 13:41:35 -05:00
Chuck Lever	c08e76d0cd	NFS: Micro-optimize nfs4_decode_dirent() Make the decoding of NFSv4 directory entries slightly more efficient by: 1. Avoiding unnecessary byte swapping when checking XDR booleans, and 2. Not bumping "p" when its value will be immediately replaced by xdr_inline_decode() This commit makes nfs4_decode_dirent() consistent with similar logic in the other two decode_dirent() functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-28 13:37:35 -05:00
Trond Myklebust	e00b8a2404	NFS: Fix an NFS client lockdep issue There is no reason to be freeing the delegation cred in the rcu callback, and doing so is resulting in a lockdep complaint that rpc_credcache_lock is being called from both softirq and non-softirq contexts. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2011-01-28 13:37:09 -05:00
Andy Adamson	c7a360b05b	NFS construct consistent co_ownerid for v4.1 As stated in section 2.4 of RFC 5661, subsequent instances of the client need to present the same co_ownerid. Concatinate the client's IP dot address, host name, and the rpc_auth pseudoflavor to form the co_ownerid. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 22:49:14 -05:00
Trond Myklebust	27dc1cd3ad	NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount If the call to nfs_wcc_update_inode() results in an attribute update, we need to ensure that the inode's attr_gencount gets bumped too, otherwise we are not protected against races with other GETATTR calls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:28:21 -05:00
Andy Adamson	b2a2897dc4	NFS improve pnfs_put_deviceid_cache debug print What we really want to know is the ref count. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:26:51 -05:00
Andy Adamson	2c4cdf8f6d	NFS fix cb_sequence error processing Always assign the cb_process_state nfs_client pointer so a processing error in cb_sequence after the nfs_client is found and referenced returns a non-NULL cb_process_state nfs_client and the matching nfs_put_client in nfs4_callback_compound dereferences the client. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:26:51 -05:00
Andy Adamson	778be232a2	NFS do not find client in NFSv4 pg_authenticate The information required to find the nfs_client cooresponding to the incoming back channel request is contained in the NFS layer. Perform minimal checking in the RPC layer pg_authenticate method, and push more detailed checking into the NFS layer where the nfs_client can be found. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:26:51 -05:00
Chuck Lever	f61f6da0d5	NFS: Prevent memory allocation failure in nfsacl_encode() nfsacl_encode() allocates memory in certain cases. This of course is not guaranteed to work. Since commit `9f06c719` "SUNRPC: New xdr_streams XDR encoder API", the kernel's XDR encoders can't return a result indicating possibly a failure, so a memory allocation failure in nfsacl_encode() has become fatal (ie, the XDR code Oopses) in some cases. However, the allocated memory is a tiny fixed amount, on the order of 40-50 bytes. We can easily use a stack-allocated buffer for this, with only a wee bit of nose-holding. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:47 -05:00
Chuck Lever	ee5dc7732b	NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!" Milan Broz <mbroz@redhat.com> reports: > on today Linus' tree I get OOps if using nfs. > > server (2.6.36) exports dir: > /dir 172.16.1.0/24(rw,async,all_squash,no_subtree_check,anonuid=500,anongid=500) > > on client it is mounted in fstab > server:/dir /mnt/tst nfs rw,soft 0 0 > > and these commands OOpses it (simplified from a configure script): > > cd /dir > touch x > install x y > > [ 105.327701] ------------[ cut here ]------------ > [ 105.327979] kernel BUG at fs/nfs/nfs3xdr.c:1338! > [ 105.328075] invalid opcode: 0000 [#1] PREEMPT SMP > [ 105.328223] last sysfs file: /sys/devices/virtual/bdi/0:16/uevent > [ 105.328349] Modules linked in: usbcore dm_mod > [ 105.328553] > [ 105.328678] Pid: 3710, comm: install Not tainted 2.6.37+ #423 440BX Desktop Reference Platform/VMware Virtual Platform > [ 105.328853] EIP: 0060:[<c116c06c>] EFLAGS: 00010282 CPU: 0 > [ 105.329152] EIP is at nfs3_xdr_enc_setacl3args+0x61/0x98 > [ 105.329249] EAX: ffffffea EBX: ce941d98 ECX: 00000000 EDX: 00000004 > [ 105.329340] ESI: ce941cd0 EDI: 000000a4 EBP: ce941cc0 ESP: ce941cb4 > [ 105.329431] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > [ 105.329525] Process install (pid: 3710, ti=ce940000 task=ced36f20 task.ti=ce940000) > [ 105.336600] Stack: > [ 105.336693] ce941cd0 ce9dc000 00000000 ce941cf8 c12ecd02 c12f43e0 c116c00b cf754158 > [ 105.336982] ce9dc004 cf754284 ce9dc004 cf7ffee8 ceff9978 ce9dc000 cf7ffee8 ce9dc000 > [ 105.337182] ce9dc000 ce941d14 c12e698d cf75412c ce941d98 cf7ffee8 cf7fff20 00000000 > [ 105.337405] Call Trace: > [ 105.337695] [<c12ecd02>] rpcauth_wrap_req+0x75/0x7f > [ 105.337806] [<c12f43e0>] ? xdr_encode_opaque+0x12/0x15 > [ 105.337898] [<c116c00b>] ? nfs3_xdr_enc_setacl3args+0x0/0x98 > [ 105.337988] [<c12e698d>] call_transmit+0x17e/0x1e8 > [ 105.338072] [<c12ec307>] __rpc_execute+0x6d/0x1a6 > [ 105.338155] [<c12ec474>] rpc_execute+0x34/0x37 > [ 105.338235] [<c12e738d>] rpc_run_task+0xb5/0xbd > [ 105.338316] [<c12e7474>] rpc_call_sync+0x3d/0x58 > [ 105.338402] [<c116d0c6>] nfs3_proc_setacls+0x18e/0x24f > [ 105.338493] [<c10b3f76>] ? __kmalloc+0x148/0x1c4 > [ 105.338579] [<c10ecd01>] ? posix_acl_alloc+0x12/0x22 > [ 105.338665] [<c116d5c8>] nfs3_proc_setacl+0xa0/0xca > [ 105.338748] [<c116d69c>] nfs3_setxattr+0x62/0x88 > [ 105.338834] [<c1317042>] ? sub_preempt_count+0x7c/0x89 > [ 105.338926] [<c116d63a>] ? nfs3_setxattr+0x0/0x88 > [ 105.339026] [<c10cfa79>] __vfs_setxattr_noperm+0x26/0x95 > [ 105.339114] [<c10cfb43>] vfs_setxattr+0x5b/0x76 > [ 105.339211] [<c10cfbfb>] setxattr+0x9d/0xc3 > [ 105.339298] [<c10a2ea8>] ? handle_pte_fault+0x258/0x5cb > [ 105.339428] [<c1091ff6>] ? __free_pages+0x1a/0x23 > [ 105.339517] [<c10498ea>] ? up_read+0x16/0x2c > [ 105.339599] [<c10b8365>] ? fget+0x0/0xa3 > [ 105.339677] [<c10b8365>] ? fget+0x0/0xa3 > [ 105.339760] [<c1025d23>] ? get_parent_ip+0xb/0x31 > [ 105.339843] [<c1317042>] ? sub_preempt_count+0x7c/0x89 > [ 105.339931] [<c10cfc72>] sys_fsetxattr+0x51/0x79 > [ 105.340014] [<c1002853>] sysenter_do_call+0x12/0x32 > [ 105.340133] Code: 2e 76 18 00 58 31 d2 8b 7f 28 f6 43 04 01 74 03 8b 53 08 6a 00 8b 46 04 6a 01 8b 0b 52 89 fa e8 85 10 f8 ff 83 c4 0c 85 c0 79 04 <0f> 0b eb fe 31 c9 f6 43 04 04 74 03 8b 4b 0c 68 00 10 00 00 8d > [ 105.350321] EIP: [<c116c06c>] nfs3_xdr_enc_setacl3args+0x61/0x98 SS:ESP 0068:ce941cb4 > [ 105.364385] ---[ end trace 01fcfe7f0f7f6e4a ]--- nfs3_xdr_enc_setacl3args() is not properly setting up the target buffer before nfsacl_encode() attempts to encode the ACL. Introduced by commit `d9c407b1` "NFS: Introduce new-style XDR encoding functions for NFSv3." Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:47 -05:00
Chuck Lever	839f7ad693	NFS: Fix "kernel BUG at fs/aio.c:554!" Nick Piggin reports: > I'm getting use after frees in aio code in NFS > > [ 2703.396766] Call Trace: > [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80 > [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40 > [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140 > [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0 > [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 > [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50 > [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80 > [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 > [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60 > [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0 > [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10 > [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b > > Adding some tracing, it is due to nfs completing the request then > returning something other than -EIOCBQUEUED, so aio.c > also completes the request. To address this, prevent the NFS direct I/O engine from completing async iocbs when the forward path returns an error without starting any I/O. This fix appears to survive ^C during both "xfstest no. 208" and "fsx -Z." It's likely this bug has existed for a very long while, as we are seeing very similar symptoms in OEL 5. Copying stable. Cc: Stable <stable@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:47 -05:00
Jesper Juhl	ad3d2eedf0	NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds(). On Mon, 17 Jan 2011, Mi Jinlong wrote: > > > Jesper Juhl: > > strrchr() can return NULL if nothing is found. If this happens we'll > > dereference a NULL pointer in > > fs/nfs/nfs4filelayoutdev.c::decode_and_add_ds(). > > > > I tried to find some other code that guarantees that this can never > > happen but I was unsuccessful. So, unless someone else can point to some > > code that ensures this can never be a problem, I believe this patch is > > needed. > > > > While I was changing this code I also noticed that all the dprintk() > > statements, except one, start with "%s:". The one missing the ":" I added > > it to. > > Maybe another one also should be changed at decode_and_add_ds() at line 243: > > 243 printk("%s Decoded address and port %s\n", __func__, buf); > Missed that one. Thanks. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:46 -05:00
Tejun Heo	ada609ee2a	workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER WQ_RESCUER is now an internal flag and should only be used in the workqueue implementation proper. Use WQ_MEM_RECLAIM instead. This doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de>	2011-01-25 14:35:54 +01:00
Fred Isaman	0da2a4ac33	NFS: fix handling of malloc failure during nfs_flush_multi() Cleanup of the allocated list entries should not call put_nfs_open_context() on each entry, as the context will always be NULL, causing an oops. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-19 15:37:49 -05:00
David Howells	ea5b778a8b	Unexport do_add_mount() and add in follow_automount(), not ->d_automount() Unexport do_add_mount() and make ->d_automount() return the vfsmount to be added rather than calling do_add_mount() itself. follow_automount() will then do the addition. This slightly complicates things as ->d_automount() normally wants to add the new vfsmount to an expiration list and start an expiration timer. The problem with that is that the vfsmount will be deleted if it has a refcount of 1 and the timer will not repeat if the expiration list is empty. To this end, we require the vfsmount to be returned from d_automount() with a refcount of (at least) 2. One of these refs will be dropped unconditionally. In addition, follow_automount() must get a 3rd ref around the call to do_add_mount() lest it eat a ref and return an error, leaving the mount we have open to being expired as we would otherwise have only 1 ref on it. d_automount() should also add the the vfsmount to the expiration list (by calling mnt_set_expiry()) and start the expiration timer before returning, if this mechanism is to be used. The vfsmount will be unlinked from the expiration list by follow_automount() if do_add_mount() fails. This patch also fixes the call to do_add_mount() for AFS to propagate the mount flags from the parent vfsmount. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:48 -05:00
David Howells	36d43a4376	NFS: Use d_automount() rather than abusing follow_link() Make NFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:34 -05:00
David Howells	cc53ce53c8	Add a dentry op to allow processes to be held during pathwalk transit Add a dentry op (d_manage) to permit a filesystem to hold a process and make it sleep when it tries to transit away from one of that filesystem's directories during a pathwalk. The operation is keyed off a new dentry flag (DCACHE_MANAGE_TRANSIT). The filesystem is allowed to be selective about which processes it holds and which it permits to continue on or prohibits from transiting from each flagged directory. This will allow autofs to hold up client processes whilst letting its userspace daemon through to maintain the directory or the stuff behind it or mounted upon it. The ->d_manage() dentry operation: int (d_manage)(struct path path, bool mounting_here); takes a pointer to the directory about to be transited away from and a flag indicating whether the transit is undertaken by do_add_mount() or do_move_mount() skipping through a pile of filesystems mounted on a mountpoint. It should return 0 if successful and to let the process continue on its way; -EISDIR to prohibit the caller from skipping to overmounted filesystems or automounting, and to use this directory; or some other error code to return to the user. ->d_manage() is called with namespace_sem writelocked if mounting_here is true and no other locks held, so it may sleep. However, if mounting_here is true, it may not initiate or wait for a mount or unmount upon the parameter directory, even if the act is actually performed by userspace. Within fs/namei.c, follow_managed() is extended to check with d_manage() first on each managed directory, before transiting away from it or attempting to automount upon it. follow_down() is renamed follow_down_one() and should only be used where the filesystem deliberately intends to avoid management steps (e.g. autofs). A new follow_down() is added that incorporates the loop done by all other callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS and CIFS do use it, their use is removed by converting them to use d_automount()). The new follow_down() calls d_manage() as appropriate. It also takes an extra parameter to indicate if it is being called from mount code (with namespace_sem writelocked) which it passes to d_manage(). follow_down() ignores automount points so that it can be used to mount on them. __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have that determine whether to abort or not itself. That would allow the autofs daemon to continue on in rcu-walk mode. Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't required as every tranist from that directory will cause d_manage() to be invoked. It can always be set again when necessary. ========================== WHAT THIS MEANS FOR AUTOFS ========================== Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to trigger the automounting of indirect mounts, and both of these can be called with i_mutex held. autofs knows that the i_mutex will be held by the caller in lookup(), and so can drop it before invoking the daemon - but this isn't so for d_revalidate(), since the lock is only held on _some_ of the code paths that call it. This means that autofs can't risk dropping i_mutex from its d_revalidate() function before it calls the daemon. The bug could manifest itself as, for example, a process that's trying to validate an automount dentry that gets made to wait because that dentry is expired and needs cleaning up: mkdir S ffffffff8014e05a 0 32580 24956 Call Trace: [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f [<ffffffff80057a2f>] lookup_create+0x46/0x80 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4 versus the automount daemon which wants to remove that dentry, but can't because the normal process is holding the i_mutex lock: automount D ffffffff8014e05a 0 32581 1 32561 Call Trace: [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14 [<ffffffff800e6d55>] do_rmdir+0x77/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 which means that the system is deadlocked. This patch allows autofs to hold up normal processes whilst the daemon goes ahead and does things to the dentry tree behind the automouter point without risking a deadlock as almost no locks are held in d_manage() and none in d_automount(). Signed-off-by: David Howells <dhowells@redhat.com> Was-Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:31 -05:00
Linus Torvalds	db9effe99a	Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: fs: fix do_last error case when need_reval_dot nfs: add missing rcu-walk check fs: hlist UP debug fixup fs: fix dropping of rcu-walk from force_reval_path fs: force_reval_path drop rcu-walk before d_invalidate fs: small rcu-walk documentation fixes Fixed up trivial conflicts in Documentation/filesystems/porting	2011-01-13 20:14:13 -08:00
Nick Piggin	657e94b673	nfs: add missing rcu-walk check Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-14 02:48:39 +00:00
Trond Myklebust	8a0eebf66e	NFS: Fix NFSv3 exclusive open semantics Commit `c0204fd2b8` (NFS: Clean up nfs4_proc_create()) broke NFSv3 exclusive open by removing the code that passes the O_EXCL flag down to nfs3_proc_create(). This patch reverts that offending hunk from the original commit. Reported-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org [2.6.37] Tested-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 12:06:29 -08:00
Al Viro	8b244ff2fa	switch nfs to ->s_d_op Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:02:45 -05:00
Linus Torvalds	b9d919a4ac	Merge branch 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (89 commits) NFS fix the setting of exchange id flag NFS: Don't use vm_map_ram() in readdir NFSv4: Ensure continued open and lockowner name uniqueness NFS: Move cl_delegations to the nfs_server struct NFS: Introduce nfs_detach_delegations() NFS: Move cl_state_owners and related fields to the nfs_server struct NFS: Allow walking nfs_client.cl_superblocks list outside client.c pnfs: layout roc code pnfs: update nfs4_callback_recallany to handle layouts pnfs: add CB_LAYOUTRECALL handling pnfs: CB_LAYOUTRECALL xdr code pnfs: change lo refcounting to atomic_t pnfs: check that partial LAYOUTGET return is ignored pnfs: add layout to client list before sending rpc pnfs: serialize LAYOUTGET(openstateid) pnfs: layoutget rpc code cleanup pnfs: change how lsegs are removed from layout list pnfs: change layout state seqlock to a spinlock pnfs: add prefix to struct pnfs_layout_hdr fields pnfs: add prefix to struct pnfs_layout_segment fields ...	2011-01-11 15:11:56 -08:00
Andy Adamson	357f54d6b3	NFS fix the setting of exchange id flag Indicate support for referrals. Do not set any PNFS roles. Check the flags returned by the server for validity. Do not use exchange flags from an old client ID instance when recovering a client ID. Update the EXCHID4_FLAG_XXX set to RFC 5661. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-11 14:17:09 -05:00
Trond Myklebust	68c404b18f	Merge branch 'bugfixes' into nfs-for-2.6.38 Conflicts: fs/nfs/nfs2xdr.c fs/nfs/nfs3xdr.c fs/nfs/nfs4xdr.c	2011-01-10 14:48:02 -05:00
Trond Myklebust	6650239a4b	NFS: Don't use vm_map_ram() in readdir vm_map_ram() is not available on NOMMU platforms, and causes trouble on incoherrent architectures such as ARM when we access the page data through both the direct and the virtual mapping. The alternative is to use the direct mapping to access page data for the case when we are not crossing a page boundary, but to copy the data into a linear scratch buffer when we are accessing data that spans page boundaries. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Cc: stable@kernel.org [2.6.37]	2011-01-10 14:45:01 -05:00
Nick Piggin	873feea09e	fs: dcache per-inode inode alias locking dcache_inode_lock can be replaced with per-inode locking. Use existing inode->i_lock for this. This is slightly non-trivial because we sometimes need to find the inode from the dentry, which requires d_inode to be stabilised (either with refcount or d_lock). Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:31 +11:00
Nick Piggin	b74c79e993	fs: provide rcu-walk aware permission i_ops Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:29 +11:00
Nick Piggin	34286d6662	fs: rcu-walk aware d_revalidate method Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:29 +11:00
Nick Piggin	fb045adb99	fs: dcache reduce branches in lookup path Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])d_op([[:space:]])=' \| xargs sed -e 's/$[^\t ]$->d_op = $.$;/d_set_d_op(\1, \2);/' -e 's/$[^\t ]$\.d_op = $.$;/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:28 +11:00
Nick Piggin	fa0d7e3de6	fs: icache RCU free inodes RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:26 +11:00
Nick Piggin	b5c84bf6f6	fs: dcache remove dcache_lock dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:23 +11:00
Nick Piggin	949854d024	fs: Use rename lock and RCU for multi-step operations The remaining usages for dcache_lock is to allow atomic, multi-step read-side operations over the directory tree by excluding modifications to the tree. Also, to walk in the leaf->root direction in the tree where we don't have a natural d_lock ordering. This could be accomplished by taking every d_lock, but this would mean a huge number of locks and actually gets very tricky. Solve this instead by using the rename seqlock for multi-step read-side operations, retry in case of a rename so we don't walk up the wrong parent. Concurrent dentry insertions are not serialised against. Concurrent deletes are tricky when walking up the directory: our parent might have been deleted when dropping locks so also need to check and retry for that. We can also use the rename lock in cases where livelock is a worry (and it is introduced in subsequent patch). Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:22 +11:00
Nick Piggin	b23fb0a603	fs: scale inode alias list Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list from concurrent modification. d_alias is also protected by d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:22 +11:00
Nick Piggin	b7ab39f631	fs: dcache scale dentry refcount Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:21 +11:00
Nick Piggin	fe15ce446b	fs: change d_delete semantics Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:18 +11:00
Trond Myklebust	d035c36c58	NFSv4: Ensure continued open and lockowner name uniqueness In order to enable migration support, we will want to move some of the structures that are subject to migration into the struct nfs_server. In particular, if we are to move the state_owner and state_owner_id to being a per-filesystem structure, then we should label the resulting open/lock owners with a per-filesytem label to ensure global uniqueness. This patch does so by adding the super block s_dev to the open/lock owner name. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 16:03:13 -05:00
Chuck Lever	d3978bb325	NFS: Move cl_delegations to the nfs_server struct Delegations are per-inode, not per-nfs_client. When a server file system is migrated, delegations on the client must be moved from the source to the destination nfs_server. Make it easier to manage a mount point's delegation list across a migration event by moving the list to the nfs_server struct. Clean up: I added documenting comments to public functions I changed in this patch. For consistency I added comments to all the other public functions in fs/nfs/delegation.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:57:46 -05:00
Chuck Lever	dda4b22562	NFS: Introduce nfs_detach_delegations() Clean up: Refactor code that takes clp->cl_lock and calls nfs_detach_delegations_locked() into its own function. While we're changing the call sites, get rid of the second parameter and the logic in nfs_detach_delegations_locked() that uses it, since callers always set that parameter of nfs_detach_delegations_locked() to NULL. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:47:57 -05:00
Chuck Lever	24d292b894	NFS: Move cl_state_owners and related fields to the nfs_server struct NFSv4 migration needs to reassociate state owners from the source to the destination nfs_server data structures. To make that easier, move the cl_state_owners field to the nfs_server struct. cl_openowner_id and cl_lockowner_id accompany this move, as they are used in conjunction with cl_state_owners. The cl_lock field in the parent nfs_client continues to protect all three of these fields. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:47:57 -05:00
Chuck Lever	fca5238ef3	NFS: Allow walking nfs_client.cl_superblocks list outside client.c We're about to move some fields from struct nfs_client to struct nfs_server. There is a many-to-one relationship between nfs_servers and nfs_clients. After these fields are moved to the nfs_server struct, to visit all of the data in these fields that is owned by one nfs_client, code will need to visit each nfs_server on the cl_superblocks list for that nfs_client. To serialize changes to the cl_superblocks list during these little expeditions, protect the list with RCU. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:47:56 -05:00
Fred Isaman	f7e8917a67	pnfs: layout roc code A layout can request return-on-close. How this interacts with the forgetful model of never sending LAYOUTRETURNS is a bit ambiguous. We forget any layouts marked roc, and wait for them to be completely forgotten before continuing with the close. In addition, to compensate for races with any inflight LAYOUTGETs, and the fact that we do not get any layout stateid back from the server, we set the barrier to the worst case scenario of current_seqid + number of outstanding LAYOUTGETS. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:32 -05:00
Alexandros Batsakis	3684037084	pnfs: update nfs4_callback_recallany to handle layouts While here, update the code a bit. Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:32 -05:00
Fred Isaman	43f1b3da8b	pnfs: add CB_LAYOUTRECALL handling This is the heart of the wave 2 submission. Add the code to trigger drain and forget of any afected layouts. In addition, we set a "barrier", below which any LAYOUTGET reply is ignored. This is to compensate for the fact that we do not wait for outstanding LAYOUTGETs to complete as per section 12.5.5.2.1 of RFC 5661. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:32 -05:00
Fred Isaman	f2a6256160	pnfs: CB_LAYOUTRECALL xdr code This is the xdr decoding for CB_LAYOUTRECALL. Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:32 -05:00
Fred Isaman	cc6e5340b0	pnfs: change lo refcounting to atomic_t This will be required to allow us to grab reference outside of i_lock. While we are at it, make put_layout_hdr take the same argument as all the related functions. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:32 -05:00
Fred Isaman	fc1794c5b0	pnfs: check that partial LAYOUTGET return is ignored Either a bad server reply, or our ignoring of multiple array segments in a reply, can cause a reply to not meet our requirements. Ensure that we ignore such replies. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:32 -05:00
Fred Isaman	2130ff6636	pnfs: add layout to client list before sending rpc Since this list will be used to search for layouts to recall, this is necessary to avoid a race where the recall comes in, sees there is nothing in the client list, and prepares to return NOMATCHING, while the LAYOUTGET gets processed before the recall updates the stateid. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:32 -05:00
Fred Isaman	cf7d63f1f9	pnfs: serialize LAYOUTGET(openstateid) We shouldn't send a LAYOUTGET(openstateid) unless all outstanding RPCs using the previous stateid are completed. This requires choosing the stateid to encode earlier, so we can abort if one is not available (we want to use the open stateid, but a LAYOUTGET is already out using it), and adding a count of the number of outstanding rpc calls using layout state (which for now consist solely of LAYOUTGETs). Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:31 -05:00
Fred Isaman	c31663d4a1	pnfs: layoutget rpc code cleanup No functional changes, just some code minor code rearrangement and comments. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:31 -05:00
Fred Isaman	4541d16c02	pnfs: change how lsegs are removed from layout list This is to prepare the way for sensible io draining. Instead of just removing the lseg from the list, we instead clear the VALID flag (preventing new io from grabbing references to the lseg) and remove the reference holding it in the list. Thus the lseg will be removed once any io in progress completes and any references still held are dropped. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:31 -05:00
Fred Isaman	fd6002e9b8	pnfs: change layout state seqlock to a spinlock This prepares for future changes, where the layout state needs to change atomically with several other variables. In particular, it will need to know if lo->segs is empty, as we test that instead of manipulating the NFS_LAYOUT_STATEID_SET bit. Moreover, the layoutstateid is not really a read-mostly structure, as it is written almost as often as it is read. The behavior of pnfs_get_layout_stateid is also slightly changed, so that it no longer changes the stateid. Its name is changed to +pnfs_choose_layoutget_stateid. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:31 -05:00
Fred Isaman	b7edfaa198	pnfs: add prefix to struct pnfs_layout_hdr fields Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:31 -05:00
Fred Isaman	566052c53b	pnfs: add prefix to struct pnfs_layout_segment fields While we are renaming all the fields, change lo->state to lo->plh_flags. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:31 -05:00
Fred Isaman	daaa82d1c7	pnfs: remove unnecessary field lgp->status Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:30 -05:00
Fred Isaman	52fabd7319	pnfs: fix incorrect comment in destroy_lseg Comment references get_layout_hdr_locked, which never existed in submitted code. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:30 -05:00
Andy Adamson	4a19de0f4b	NFS rename client back channel transport field Differentiate from server backchannel Signed-off-by: Andy Adamson <andros@netapp.com> Acked-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:25 -05:00
Andy Adamson	42acd02182	NFS add session back channel draining Currently session draining only drains the fore channel. The back channel processing must also be drained. Use the back channel highest_slot_used to indicate that a callback is being processed by the callback thread. Move the session complete to be per channel. When the session is draininig, wait for any current back channel processing to complete and stop all new back channel processing by returning NFS4ERR_DELAY to the back channel client. Drain the back channel, then the fore channel. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:25 -05:00
Andy Adamson	ece0de633c	NFS RPC_AUTH_GSS unsupported on v4.1 back channel Signed-off-by: Andy Adamson <andros@netapp.com> Acked-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:24 -05:00
Andy Adamson	c36fca52f5	NFS refactor nfs_find_client and reference client across callback processing Fixes a bug where the nfs_client could be freed during callback processing. Refactor nfs_find_client to use minorversion specific means to locate the correct nfs_client structure. In the NFS layer, V4.0 clients are found using the callback_ident field in the CB_COMPOUND header. V4.1 clients are found using the sessionID in the CB_SEQUENCE operation which is also compared against the sessionID associated with the back channel thread after a successful CREATE_SESSION. Each of these methods finds the one an only nfs_client associated with the incoming callback request - so nfs_find_client_next is not needed. In the RPC layer, the pg_authenticate call needs to find the nfs_client. For the v4.0 callback service, the callback identifier has not been decoded so a search by address, version, and minorversion is used. The sessionid for the sessions based callback service has (usually) not been set for the pg_authenticate on a CB_NULL call which can be sent prior to the return of a CREATE_SESSION call, so the sessionid associated with the back channel thread is not used to find the client in pg_authenticate for CB_NULL calls. Pass the referenced nfs_client to each CB_COMPOUND operation being proceesed via the new cb_process_state structure. The reference is held across cb_compound processing. Use the new cb_process_state struct to move the NFS4ERR_RETRY_UNCACHED_REP processing from process_op into nfs4_callback_sequence where it belongs. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:24 -05:00
Andy Adamson	2c2618c6f2	NFS associate sessionid with callback connection The sessions based callback service is started prior to the CREATE_SESSION call so that it can handle CB_NULL requests which can be sent before the CREATE_SESSION call returns and the session ID is known. Set the callback sessionid after a sucessful CREATE_SESSION. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:24 -05:00
Andy Adamson	f4eecd5da3	NFS implement v4.0 callback_ident Use the small id to pointer translator service to provide a unique callback identifier per SETCLIENTID call used to identify the v4.0 callback service associated with the clientid. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:24 -05:00
Andy Adamson	ea00528126	NFS do not clear minor version at nfs_client free Resetting the client minor version operations causes nfs4_destroy_callback to fail to shutdown the NFSv4.1 callback service. There is no reason to reset the client minorversion operations when the nfs_client struct is being freed. Remove the minorverion reset and rename the function. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:24 -05:00
Andy Adamson	01c9a0bc60	NFS use svc_create_xprt for NFSv4.1 callback service The new back channel transport means we call the normal creation routine as well as svc_xprt_put. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:24 -05:00
Aneesh Kumar K.V	64c2ce8b72	nfsv4: Switch to generic xattr handling code This patch make nfsv4 use the generic xattr handling code to get the nfsv4 acl. This will help us to add richacl support to nfsv4 in later patches Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-04 13:10:41 -05:00
Aneesh Kumar K.V	a8a5da996d	nfs: Set MS_POSIXACL always We want to skip VFS applying mode for NFS. So set MS_POSIXACL always and selectively use umask. Ideally we would want to use umask only when we don't have inheritable ACEs set. But NFS currently don't allow to send umask to the server. So this is best what we can do and this is consistent with NFSv3 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-04 13:10:40 -05:00
Namhyung Kim	bf0c84f161	NFS: use ERR_CAST() Use ERR_CAST() intead of wierd-looking cast. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-04 13:10:39 -05:00
J. Bruce Fields	5f3e97c9ee	nfs: fix mispelling of idmap CONFIG symbol Trivial, but confusing when you're trying to grep through this code.... Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-04 13:10:39 -05:00
Jesper Juhl	878215feb8	NFS: Don't leak in nfs_proc_symlink() Hi, In fs/nfs/proc.c::nfs_proc_symlink() we will leak memory if either nfs_alloc_fhandle() or nfs_alloc_fattr() returns NULL but the other one doesn't. This patch ensures memory allocated by one when the other fails is always released (this is safe since nfs_free_fattr() and nfs_free_fhandle() both call kfree which deals gracefully with NULL pointers). Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-04 13:10:36 -05:00
Trond Myklebust	1174dd1f89	NFSv4: Convert a few commas into semicolons... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-21 11:51:27 -05:00
Stanislav Kinsbursky	aa69947399	NFS: suppressing showing of default mount port value in /proc fixed Update: added check for zero value as it was before (note: can't simply check mountd_port for positive value because it's typeof unsigned short) Default value for mount server port is set to NFS_UNSPEC_PORT (-1) and will not be changed during parsing mount options for mound data version 6. This default value will be showed for mountport in /proc/mounts always since current default check is for zero value. This small mistake leads to big problem, because during umount.nfs execution from old user-space utils (at least nfs-utils 1.0.9) this value will be used as the server port to connect to. This request will be rejected (since port is 65535) and thus nfs mount point can't be unmounted. Note from Chuck Lever (chuck.lever@oracle.com): this is only possible if /etc/mtab is a link to /proc/mounts. Not all systems have this configuration. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-21 11:51:25 -05:00
J. Bruce Fields	611c96c8f7	nfs4: fix units bug causing hang on recovery Note that cl_lease_time is in jiffies. This can cause a very long wait in the NFS4ERR_CLID_INUSE case. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-21 11:51:24 -05:00
Jesper Juhl	72895b1ac7	nfs: Take advantage of kmem_cache_zalloc() in nfs_page_alloc() Take advantage of kmem_cache_zalloc() in nfs_page_alloc(). Save a call to memset() and a few bytes. Before: [jj@dragon linux-2.6]$ size fs/nfs/pagelist.o text data bss dec hex filename 1765 0 8 1773 6ed fs/nfs/pagelist.o After: [jj@dragon linux-2.6]$ size fs/nfs/pagelist.o text data bss dec hex filename 1749 0 8 1757 6dd fs/nfs/pagelist.o Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-21 11:51:24 -05:00
Tobias Klauser	c8b031ebc1	NFS: Remove redundant unlikely() IS_ERR() already implies unlikely(), so it can be omitted here. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-21 11:51:23 -05:00
Chuck Lever	bf2695516d	SUNRPC: New xdr_streams XDR decoder API Now that all client-side XDR decoder routines use xdr_streams, there should be no need to support the legacy calling sequence [rpc_rqst , __be32 , RPC res *] anywhere. We can construct an xdr_stream in the generic RPC code, instead of in each decoder function. This is a refactoring change. It should not cause different behavior. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:25 -05:00
Chuck Lever	9f06c719f4	SUNRPC: New xdr_streams XDR encoder API Now that all client-side XDR encoder routines use xdr_streams, there should be no need to support the legacy calling sequence [rpc_rqst , __be32 , RPC arg *] anywhere. We can construct an xdr_stream in the generic RPC code, instead of in each encoder function. Also, all the client-side encoder functions return 0 now, making a return value superfluous. Take this opportunity to convert them to return void instead. This is a refactoring change. It should not cause different behavior. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:25 -05:00
Chuck Lever	b43cd8c153	NFS: Remove unused UMNT response data structure Clean up. The UMNT request has a NULL response. There's no need to set up a mountres structure for it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:25 -05:00
Chuck Lever	98eb2b4f93	NFS: Avoid return code checking in mount XDR encoder functions Clean up. The trend in the other XDR encoder functions is to BUG() when encoding problems occur, since a problem here is always due to a local coding error. Then, instead of a status, zero is unconditionally returned. Update the mount client XDR encoders to behave this way. To finish the update, use the new-style be32_to_cpup() and cpu_to_be32() macros, and compute the buffer sizes using raw integers instead of sizeof(). This matches the conventions used in other XDR functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:25 -05:00
Chuck Lever	ead0059788	NFS: Squelch compiler warning in decode_getdeviceinfo() Clean up. .../linux/nfs-2.6/fs/nfs/nfs4xdr.c: In function ‘decode_getdeviceinfo’: .../linux/nfs-2.6/fs/nfs/nfs4xdr.c:5008: warning: comparison between signed and unsigned integer expressions Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:24 -05:00
Chuck Lever	573c4e1ef5	NFS: Simplify ->decode_dirent() calling sequence Clean up. The pointer returned by ->decode_dirent() is no longer used as a pointer. The only call site (xdr_decode() in fs/nfs/dir.c) simply extracts the errno value encoded in the pointer. Replace the returned pointer with a standard integer errno return value. Also, pass the "server" argument as part of the nfs_entry instead of as a separate parameter. It's faster to derive "server" in nfs_readdir_xdr_to_array() since we already have the directory's inode handy. "server" ought to be invariant for a set of entries in the same directory, right? The legacy versions of decode_dirent() don't use "server" anyway, so it's wasted work for them to derive and pass "server" for each entry. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:24 -05:00
Chuck Lever	8111f37360	NFS: Fix hdrlen calculation in NFSv4's decode_read() When computing the length of the header, be sure to include the four octets consumed by "count". Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:24 -05:00
Chuck Lever	7d93bd71cb	NFS: Repair whitespace damage in NFS PROC macro Clean up. When I was making other changes in this area, checkscript.pl complained about the use of leading blanks in the PROC macros in the xdr files. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:24 -05:00
Chuck Lever	f604870939	NFS: Move and update xdr_decode_foo() functions that we're keeping Clean up. Move the timestamp decoder to match the placement and naming conventions of the other helpers. Fold xdr_decode_fattr() into decode_fattr3(), which is now it's only user. Fold xdr_decode_wcc_attr() into decode_wcc_attr(), which is now it's only user. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:23 -05:00
Chuck Lever	b2cdd9c9c9	NFS: Remove unused old NFSv3 decoder functions Clean up. Remove unused legacy result decoder functions, and any now unused decoder helper functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:23 -05:00
Chuck Lever	f5fc3c50c9	NFS: Switch in new NFSv3 decoder functions The naming scheme of the new decoder functions, which follows the NFSv4 XDR decoder functions, is slightly different than the scheme used for the old functions. Rename the functions as a separate step to keep the patches clean. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:23 -05:00
Chuck Lever	e4f9323409	NFS: Introduce new-style XDR decoding functions for NFSv2 We'd like to prevent local buffer overflows caused by malicious or broken servers. New xdr_stream style decoders can do that. For efficiency, we also eventually want to be able to pass xdr_streams from call_decode() to all XDR decoding functions, rather than building an xdr_stream in every XDR decoding function in the kernel. Static helper functions are left without the "inline" directive. This allows the compiler to choose automatically how to optimize these for size or speed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:23 -05:00
Chuck Lever	9d5a643439	NFS: Update xdr_encode_foo() functions that we're keeping Clean up. Move the timestamp and the sattr encoder to match the placement convention of the other helpers, update their coding style, and refresh their documenting comments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:23 -05:00
Chuck Lever	499ff710b2	NFS: Remove unused old NFSv3 encoder functions Clean up. Remove unused legacy argument encoder functions, and any now unused encoder helper functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:22 -05:00
Chuck Lever	ad96b5b5ea	NFS: Replace old NFSv3 encoder functions with xdr_stream-based ones The naming scheme of the new encoder functions, which follows the NFSv4 XDR encoder functions, is slightly different than the scheme used for the old functions. Rename the functions as a separate step to keep the patches clean. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:22 -05:00
Chuck Lever	d9c407b138	NFS: Introduce new-style XDR encoding functions for NFSv3 We're interested in taking advantage of the safety benefits of xdr_streams. These data structures allow more careful checking for buffer overflow while encoding. More careful type checking is also introduced in the new functions. For efficiency, we also eventually want to be able to pass xdr_streams from call_encode() to all XDR encoding functions, rather than building an xdr_stream in every XDR encoding function in the kernel. To do this means all encoders must be ready to handle a passed-in xdr_stream. The new encoders follow the modern paradigm for XDR encoders: BUG on error, and always return a zero status code. Static helper functions are left without the "inline" directive. This allows the compiler to choose automatically how to optimize these for size or speed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:22 -05:00
Chuck Lever	5f96e5e31b	NFS: Move and update xdr_decode_foo() functions that we're keeping Clean up. Move the timestamp decoder to match the placement and naming conventions of the other helpers. Fold xdr_decode_fattr() into decode_fattr(), which is now it's only user. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:21 -05:00
Chuck Lever	661ad4239a	NFS: Replace old NFSv2 decoder functions with xdr_stream-based ones Clean up. Remove unused legacy result decoder functions, and any now unused decoder helper functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:21 -05:00
Chuck Lever	f796f8b3ae	NFS: Introduce new-style XDR decoding functions for NFSv2 We'd like to prevent local buffer overflows caused by malicious or broken servers. New xdr_stream style decoders can do that. For efficiency, we also eventually want to be able to pass xdr_streams from call_decode() to all XDR decoding functions, rather than building an xdr_stream in every XDR decoding function in the kernel. nfs_decode_dirent() is renamed to follow the naming convention of the other two dirent decoders. Static helper functions are left without the "inline" directive. This allows the compiler to choose automatically how to optimize these for size or speed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:21 -05:00
Chuck Lever	8582849324	NFS: Use the "nfs_stat" enum for nfs_stat_to_errno()'s argument Clean up. To distinguish more clearly between the on-the-wire NFSERR_ value and our local errno values, use the proper type for the argument of nfs_stat_to_errno(). Add a documenting comment appropriate for a global function shared outside this source file. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:21 -05:00
Chuck Lever	282ac2a573	NFS: Update xdr_encode_foo() functions that we're keeping Clean up. The new helper functions are kept in order by section of RFC 1094. Move the two timestamp encoders we're keeping, update their coding style, and refresh their documenting comments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:20 -05:00
Chuck Lever	2d70f533ea	NFS: Remove old NFSv2 encoder functions Clean up: Remove unused legacy argument encoder functions, and any now unused encoder helper functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:20 -05:00
Chuck Lever	25a0866cc6	NFS: Introduce new-style XDR encoding functions for NFSv2 We're interested in taking advantage of the safety benefits of xdr_streams. These data structures allow more careful checking for buffer overflow while encoding. More careful type checking is also introduced in the new functions. For efficiency, we also eventually want to be able to pass xdr_streams from call_encode() to all XDR encoding functions, rather than building an xdr_stream in every XDR encoding function in the kernel. To do this means all encoders must be ready to handle a passed-in xdr_stream. The new encoders follow the modern paradigm for XDR encoders: BUG on any error, and always return a zero status code. Static helper functions are left without the "inline" directive. This allows the compiler to choose automatically how to optimize these for size or speed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:20 -05:00
Chuck Lever	5b362ac379	NFS: Fix panic after nfs_umount() After a few unsuccessful NFS mount attempts in which the client and server cannot agree on an authentication flavor both support, the client panics. nfs_umount() is invoked in the kernel in this case. Turns out nfs_umount()'s UMNT RPC invocation causes the RPC client to write off the end of the rpc_clnt's iostat array. This is because the mount client's nrprocs field is initialized with the count of defined procedures (two: MNT and UMNT), rather than the size of the client's proc array (four). The fix is to use the same initialization technique used by most other upper layer clients in the kernel. Introduced by commit `0b524123`, which failed to update nrprocs when support was added for UMNT in the kernel. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=24302 BugLink: http://bugs.launchpad.net/bugs/683938 Reported-by: Stefan Bader <stefan.bader@canonical.com> Tested-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@kernel.org # >= 2.6.32 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-10 13:01:50 -05:00
Trond Myklebust	2df485a774	nfs: remove extraneous and problematic calls to nfs_clear_request When a nfs_page is freed, nfs_free_request is called which also calls nfs_clear_request to clean out the lock and open contexts and free the pagecache page. However, a couple of places in the nfs code call nfs_clear_request themselves. What happens here if the refcount on the request is still high? We'll be releasing contexts and freeing pointers while the request is possibly still in use. Remove those bare calls to nfs_clear_context. That should only be done when the request is being freed. Note that when doing this, we need to watch out for tests of req->wb_page. Previously, nfs_set_page_tag_locked() and nfs_clear_page_tag_locked() would check the value of req->wb_page to figure out if the page is mapped into the nfsi->nfs_page_tree. We now indicate the page is mapped using the new bit PG_MAPPED in req->wb_flags . Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-07 23:02:44 -05:00
Mi Jinlong	0de1b7e800	nfs: kernel should return EPROTONOSUPPORT when not support NFSv4 When nfs client(kernel) don't support NFSv4, maybe user build kernel without NFSv4, there is a problem. Using command "mount SERVER-IP:/nfsv3 /mnt/" to mount NFSv3 filesystem, mount should should success, but fail and get error: "mount.nfs: an incorrect mount option was specified" System call mount "nfs"(not "nfs4") with "vers=4", if CONFIG_NFS_V4 is not defined, the "vers=4" will be parsed as invalid argument and kernel return EINVAL to nfs-utils. About that, we really want get EPROTONOSUPPORT rather than EINVAL. This path make sure kernel parses argument success, and return EPROTONOSUPPORT at nfs_validate_mount_data(). Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-07 19:30:44 -05:00
Sergey Vlasov	21ac19d484	NFS: Fix fcntl F_GETLK not reporting some conflicts The commit `129a84de23` (locks: fix F_GETLK regression (failure to find conflicts)) fixed the posix_test_lock() function by itself, however, its usage in NFS changed by the commit `9d6a8c5c21` (locks: give posix_test_lock same interface as ->lock) remained broken - subsequent NFS-specific locking code received F_UNLCK instead of the user-specified lock type. To fix the problem, fl->fl_type needs to be saved before the posix_test_lock() call and restored if no local conflicts were reported. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=23892 Tested-by: Alexander Morozov <amorozov@etersoft.ru> Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Cc: <stable@kernel.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-07 19:30:43 -05:00
Aneesh Kumar K.V	08a22b392a	nfs: Discard ACL cache on mode update An update of mode bits can result in ACL value being changed. We need to mark the acl cache invalid when we update mode. Similarly we need to update file attribute when we change ACL value Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-07 19:30:42 -05:00
Trond Myklebust	47c716cbf6	NFS: Readdir cleanups No functional changes, but clarify the code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-07 14:09:02 -05:00
Trond Myklebust	18fb5fe40c	NFS: nfs_readdir_search_for_cookie() don't mark as eof if cookie not found If we're searching for a specific cookie, and it isn't found in the page cache, we should try an uncached_readdir(). To do so, we return EBADCOOKIE, but we don't set desc->eof. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-07 12:41:58 -05:00
Trond Myklebust	11de3b11e0	NFS: Fix a memory leak in nfs_readdir We need to ensure that the entries in the nfs_cache_array get cleared when the page is removed from the page cache. To do so, we use the freepage address_space operation. Change nfs_readdir_clear_array to use kmap_atomic(), so that the function can be safely called from all contexts. Finally, modify the cache_page_release helper to call nfs_readdir_clear_array directly, when dealing with an anonymous page from 'uncached_readdir'. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-02 09:58:00 -05:00
Trond Myklebust	0aded708d1	NFS: Ensure we use the correct cookie in nfs_readdir_xdr_filler We need to use the cookie from the previous array entry, not the actual cookie that we are searching for (except for the case of uncached_readdir). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-01 08:16:16 -05:00
Trond Myklebust	37a09f0745	NFS: Fix a readdirplus bug When comparing filehandles in the helper nfs_same_file(), we should not be using 'strncmp()': filehandles are not null terminated strings. Instead, we should just use the existing helper nfs_compare_fh(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-30 10:18:49 -08:00
Trond Myklebust	0b26a0bf6f	NFS: Ensure we return the dirent->d_type when it is known Store the dirent->d_type in the struct nfs_cache_array_entry so that we can use it in getdents() calls. This fixes a regression with the new readdir code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:48 -05:00
Trond Myklebust	3020093f57	NFS: Correct the array bound calculation in nfs_readdir_add_to_array It looks as if the array size calculation in MAX_READDIR_ARRAY does not take the alignment of struct nfs_cache_array_entry into account. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:47 -05:00
Trond Myklebust	ece0b4233b	NFS: Don't ignore errors from nfs_do_filldir() We should ignore the errors from the filldir callback, and just interpret them as meaning we should exit, however we should definitely pass back ENOMEM errors. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:47 -05:00
Trond Myklebust	85f8607e16	NFS: Fix the error handling in "uncached_readdir()" Currently, uncached_readdir() is broken because if fails to handle the results from nfs_readdir_xdr_to_array() correctly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:46 -05:00
Trond Myklebust	7a8e1dc34f	NFS: Fix a page leak in uncached_readdir() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:45 -05:00
Trond Myklebust	e7c58e974a	NFS: Fix a page leak in nfs_do_filldir() nfs_do_filldir() must always free desc->page when it is done, otherwise we end up leaking the page. Also remove unused variable 'dentry'. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:44 -05:00
Trond Myklebust	5c346854d8	NFS: Assume eof if the server returns no readdir records Some servers are known to be buggy w.r.t. this. Deal with them... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:44 -05:00
Trond Myklebust	463a376eae	NFS: Buffer overflow in ->decode_dirent() should not be fatal Overflowing the buffer in the readdir ->decode_dirent() should not lead to a fatal error, but rather to an attempt to reread the record in question. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:43 -05:00
Arun Bharadwaj	b47d19de2c	Pure nfs client performance using odirect. When an application opens a file with O_DIRECT flag, if the size of the data that is written is equal to wsize, the client sends a WRITE RPC with stable flag set to UNSTABLE followed by a single COMMIT RPC rather than sending a single WRITE RPC with the stable flag set to FILE_SYNC. This a bug. Patch to fix this. Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:42 -05:00
Arnd Bergmann	451a3c24b0	BKL: remove extraneous #include <smp_lock.h> The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-17 08:59:32 -08:00
Catalin Marinas	04e4bd1c67	nfs: Ignore kmemleak false positive in nfs_readdir_make_qstr Strings allocated via kmemdup() in nfs_readdir_make_qstr() are referenced from the nfs_cache_array which is stored in a page cache page. Kmemleak does not scan such pages and it reports several false positives. This patch annotates the string->name pointer so that kmemleak does not consider it a real leak. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-16 12:03:14 -05:00
Trond Myklebust	ac39612824	NFS: readdir shouldn't read beyond the reply returned by the server Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-15 20:44:29 -05:00
Trond Myklebust	8cd51a0ccd	NFS: Fix a couple of regressions in readdir. Fix up the issue that array->eof_index needs to be able to be set even if array->size == 0. Ensure that we catch all important memory allocation error conditions and/or kmap() failures. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-15 20:44:28 -05:00
Trond Myklebust	23ebbd9acf	Revert "NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR" This reverts commit `80e60639f1`. This change requires further fixes to ensure that the open doesn't succeed if the lookup later results in a regular file being created. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-15 20:44:27 -05:00
Paulius Zaleckas	1e657bd51f	Regression: fix mounting NFS when NFSv3 support is not compiled Trying to mount NFS (root partition in my case) fails if CONFIG_NFS_V3 is not selected. nfs_validate_mount_data() returns EPROTONOSUPPORT, because of this check: #ifndef CONFIG_NFS_V3 if (args->version == 3) goto out_v3_not_compiled; #endif /* !CONFIG_NFS_V3 */ and args->version was always initialized to 3. It was working in 2.6.36 Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-15 20:44:27 -05:00
Christoph Hellwig	51ee4b84f5	locks: let the caller free file_lock on ->setlease failure The caller allocated it, the caller should free it. The only issue so far is that we could change the flp pointer even on an error return if the fl_change callback failed. But we can simply move the flp assignment after the fl_change invocation, as the callers don't care about the flp return value if the setlease call failed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-31 06:35:15 -07:00
J. Bruce Fields	05fa3135fd	locks: fix setlease methods to free passed-in lock We modified setlease to require the caller to allocate the new lease in the case of creating a new lease, but forgot to fix up the filesystem methods. Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-30 18:08:15 -07:00
Al Viro	31f43471e9	convert simple cases of nfs-related ->get_sb() to ->mount() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:17:23 -04:00
Al Viro	a4118ee1d8	a couple of open-coded ihold() introduced by nfs merge Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:14:48 -04:00
Geert Uytterhoeven	12364a4f05	nfs4: The difference of 2 pointers is ptrdiff_t On m68k, which is 32-bit: fs/nfs/nfs4proc.c: In function ‘nfs41_sequence_done’: fs/nfs/nfs4proc.c:432: warning: format ‘%ld’ expects type ‘long int’, but argument 3 has type ‘int’ fs/nfs/nfs4proc.c: In function ‘nfs4_setup_sequence’: fs/nfs/nfs4proc.c:576: warning: format ‘%ld’ expects type ‘long int’, but argument 5 has type ‘int’ On 32-bit, ptrdiff_t is int; on 64-bit, ptrdiff_t is long. Introduced by commit `dfb4f30983` ("NFSv4.1: keep seq_res.sr_slot as pointer rather than an index") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-28 15:49:29 -04:00
Dan Carpenter	8f0d97b415	nfs: testing the wrong variable The intent was to test "*desc" for allocation failures, but it tests "desc" which is always a valid pointer here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-28 11:18:00 -04:00
Jeff Layton	015f0212d5	nfs: handle lock context allocation failures in nfs_create_request nfs_get_lock_context can return NULL on an allocation failure. Regression introduced by commit `f11ac8db`. Reported-by: Steve Dickson <steved@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-28 11:17:25 -04:00
Steve Dickson	568a810d7e	Fixed Regression in NFS Direct I/O path A typo, introduced by commit `f11ac8db`, in the nfs_direct_write() routine causes writes with O_DIRECT set to fail with a ENOMEM error. Found-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve Dickson <steved@redhat.com> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-28 11:14:05 -04:00
Linus Torvalds	7420a8c0de	Merge branch 'flock' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'flock' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: locks: turn lock_flocks into a spinlock fasync: re-organize fasync entry insertion to allow it under a spinlock locks/nfsd: allocate file lock outside of spinlock lockd: fix nlmsvc_notify_blocked locking lockd: push lock_flocks down	2010-10-27 18:13:34 -07:00
Arnd Bergmann	763641d812	lockd: push lock_flocks down lockd should use lock_flocks() instead of lock_kernel() to lock against posix locks accessing the i_flock list. This is a prerequisite to turning lock_flocks into a spinlock. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: J. Bruce Fields <bfields@redhat.com>	2010-10-27 21:39:39 +02:00
Linus Torvalds	426e1f5cec	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) split invalidate_inodes() fs: skip I_FREEING inodes in writeback_sb_inodes fs: fold invalidate_list into invalidate_inodes fs: do not drop inode_lock in dispose_list fs: inode split IO and LRU lists fs: switch bdev inode bdi's correctly fs: fix buffer invalidation in invalidate_list fsnotify: use dget_parent smbfs: use dget_parent exportfs: use dget_parent fs: use RCU read side protection in d_validate fs: clean up dentry lru modification fs: split __shrink_dcache_sb fs: improve DCACHE_REFERENCED usage fs: use percpu counter for nr_dentry and nr_dentry_unused fs: simplify __d_free fs: take dcache_lock inside __d_path fs: do not assign default i_ino in new_inode fs: introduce a per-cpu last_ino allocator new helper: ihold() ...	2010-10-26 17:58:44 -07:00
Linus Torvalds	18a043f941	Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: rename nfs.upcall -> nfs.idmap NFS: Fix a compile issue in nfs_root	2010-10-26 17:24:28 -07:00
Wu Fengguang	1b430beee5	writeback: remove nonblocking/encountered_congestion references This removes more dead code that was somehow missed by commit `0d99519efe` (writeback: remove unused nonblocking and congestion checks). There are no behavior change except for the removal of two entries from one of the ext4 tracing interface. The nonblocking checks in ->writepages are no longer used because the flusher now prefer to block on get_request_wait() than to skip inodes on IO congestion. The latter will lead to more seeky IO. The nonblocking checks in ->writepage are no longer used because it's redundant with the WB_SYNC_NONE check. We no long set ->nonblocking in VM page out and page migration, because a) it's effectively redundant with WB_SYNC_NONE in current code b) it's old semantic of "Don't get stuck on request queues" is mis-behavior: that would skip some dirty inodes on congestion and page out others, which is unfair in terms of LRU age. Inspired by Christoph Hellwig. Thanks! Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: David Howells <dhowells@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: Steve French <sfrench@samba.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:05 -07:00
Trond Myklebust	036a107597	NFS: Fix a compile issue in nfs_root Stephen Rothwell reports: > /home/test/linux-2.6/fs/nfs/nfsroot.c: In function 'nfs_root_debug': > /home/test/linux-2.6/fs/nfs/nfsroot.c:110:2: error: 'nfs_debug' > undeclared (first use in this function) > /home/test/linux-2.6/fs/nfs/nfsroot.c:110:2: note: each undeclared > identifier is reported only once for each function it appears in > make[3]: * [fs/nfs/nfsroot.o] Error 1 > make[2]: * [fs/nfs] Error 2 > make[1]: * [fs] Error 2 > make: * [sub-make] Error 2 Which is caused by commit `306a075362` (NFS: Allow NFSROOT debugging messages to be enabled dynamically) Fix is to disable this code when RPC_DEBUG is disabled. Reported-by: Zimny Lech <napohybelskurwysynom2010@gmail.com> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-26 13:56:42 -04:00
Linus Torvalds	4390110fef	Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linux * 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits) svcrpc: svc_tcp_sendto XPT_DEAD check is redundant svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue svcrpc: assume svc_delete_xprt() called only once svcrpc: never clear XPT_BUSY on dead xprt nfsd4: fix connection allocation in sequence() nfsd4: only require krb5 principal for NFSv4.0 callbacks nfsd4: move minorversion to client nfsd4: delay session removal till free_client nfsd4: separate callback change and callback probe nfsd4: callback program number is per-session nfsd4: track backchannel connections nfsd4: confirm only on succesful create_session nfsd4: make backchannel sequence number per-session nfsd4: use client pointer to backchannel session nfsd4: move callback setup into session init code nfsd4: don't cache seq_misordered replies SUNRPC: Properly initialize sock_xprt.srcaddr in all cases SUNRPC: Use conventional switch statement when reclassifying sockets sunrpc/xprtrdma: clean up workqueue usage sunrpc: Turn list_for_each-s into the ..._entry-s ... Fix up trivial conflicts (two different deprecation notices added in separate branches) in Documentation/feature-removal-schedule.txt	2010-10-26 09:55:25 -07:00
Linus Torvalds	a4dd8dce14	Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: net/sunrpc: Use static const char arrays nfs4: fix channel attribute sanity-checks NFSv4.1: Use more sensible names for 'initialize_mountpoint' NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure NFS: client needs to maintain list of inodes with active layouts NFS: create and destroy inode's layout cache NFSv4.1: pnfs: filelayout: introduce minimal file layout driver NFSv4.1: pnfs: full mount/umount infrastructure NFS: set layout driver NFS: ask for layouttypes during v4 fsinfo call NFS: change stateid to be a union NFSv4.1: pnfsd, pnfs: protocol level pnfs constants SUNRPC: define xdr_decode_opaque_fixed NFSD: remove duplicate NFS4_STATEID_SIZE	2010-10-26 09:52:09 -07:00
J. Bruce Fields	43c2e885be	nfs4: fix channel attribute sanity-checks The sanity checks here are incorrect; in the worst case they allow values that crash the client. They're also over-reliant on the preprocessor. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-25 22:19:41 -04:00
Al Viro	7de9c6ee3e	new helper: ihold() Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:11 -04:00
Linus Torvalds	74eb94b218	Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (67 commits) SUNRPC: Cleanup duplicate assignment in rpcauth_refreshcred nfs: fix unchecked value Ask for time_delta during fsinfo probe Revalidate caches on lock SUNRPC: After calling xprt_release(), we must restart from call_reserve NFSv4: Fix up the 'dircount' hint in encode_readdir NFSv4: Clean up nfs4_decode_dirent NFSv4: nfs4_decode_dirent must clear entry->fattr->valid NFSv4: Fix a regression in decode_getfattr NFSv4: Fix up decode_attr_filehandle() to handle the case of empty fh pointer NFS: Ensure we check all allocation return values in new readdir code NFS: Readdir plus in v4 NFS: introduce generic decode_getattr function NFS: check xdr_decode for errors NFS: nfs_readdir_filler catch all errors NFS: readdir with vmapped pages NFS: remove page size checking code NFS: decode_dirent should use an xdr_stream SUNRPC: Add a helper function xdr_inline_peek NFS: remove readdir plus limit ...	2010-10-25 13:48:29 -07:00
Trond Myklebust	1c787096fc	NFSv4.1: Use more sensible names for 'initialize_mountpoint' The initialize_mountpoint/uninitialise_mountpoint functions are really about setting or clearing the layout driver to be used on this filesystem. Change the names to the more descriptive 'set_layoutdriver/clear_layoutdriver'. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:07:11 -04:00
Andy Adamson	16b374ca43	NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure Implement the driver's io_ops->alloc_lseg and free_lseg functions, which integrate into the deviceid cache and calls out to nfs4_proc_getdeviceinfo when necessary. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Dean Hildebrand <dhildebz@umich.edu> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:07:11 -04:00
Andy Adamson	b1f69b754e	NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure Add the ability to actually send LAYOUTGET and GETDEVICEINFO. This also adds in the machinery to handle layout state and the deviceid cache. Note that GETDEVICEINFO is not called directly by the generic layer. Instead it is called by the drivers while parsing the LAYOUTGET opaque data in response to an unknown device id embedded therein. RFC 5661 only encodes device ids within the driver-specific opaque data. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Dean Hildebrand <dhildebz@umich.edu> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:07:10 -04:00
Andy Adamson	974cec8ca0	NFS: client needs to maintain list of inodes with active layouts In particular, server reboot will invalidate all layouts. Note that in order to have an active layout, we must get a successful response from the server. To avoid adding that machinery, this patch just includes a stub that fakes up a successful return. Since the layout is never referenced for io, this is not a problem. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Dean Hildebrand <dhildebz@umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:07:10 -04:00
Benny Halevy	e5e940170b	NFS: create and destroy inode's layout cache At the start of the io paths, try to grab the relevant layout information. This will initiate the inode's layout cache, but stubs ensure the cache stays empty. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Dean Hildebrand <dhildebz@umich.edu> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:07:10 -04:00
Dean Hildebrand	7ab672ce31	NFSv4.1: pnfs: filelayout: introduce minimal file layout driver This driver just registers itself and supplies trivial mount/umount functions. Signed-off-by: Dean Hildebrand <dhildebz@umich.edu> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:07:10 -04:00
Fred Isaman	02c35fca7c	NFSv4.1: pnfs: full mount/umount infrastructure Allow a module implementing a layout type to register, and have its mount/umount routines called for filesystems that the server declares support it. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:07:10 -04:00
Ricardo Labiaga	85e174ba6b	NFS: set layout driver Put in the infrastructure that uses information returned from the server at mount to select a layout driver module. In this patch, a stub is used that always returns "no driver found". Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Dean Hildebrand <dhildebz@umich.edu> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:07:10 -04:00
Andy Adamson	504913fbc8	NFS: ask for layouttypes during v4 fsinfo call This information will be used to determine which layout driver, if any, to use for subsequent IO on this filesystem. Each driver is assigned an integer id, with 0 reserved to indicate no driver. The server can in theory return multiple ids. However, our current client implementation only notes the first entry and ignores the rest. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:07:09 -04:00
Alexandros Batsakis	9449925273	NFS: change stateid to be a union In NFSv4.1 the stateid consists of the other and seqid fields. For layout processing we need to numerically compare the seqid value of layout stateids. To do so, introduce a union to nfs4_stateid to switch between opaque(16 bytes) and opaque(12 bytes) / __be32 Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:02:53 -04:00
Roman Borisov	3388bff5cf	nfs: fix unchecked value Return value of "decode_attr_bitmap()" was not checked; Signed-off-by: Roman Borisov <ext-roman.borisov@nokia.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:00:12 -04:00
Ricardo Labiaga	55b6e7742d	Ask for time_delta during fsinfo probe Used by the client to determine if the server has a granular enough time stamp. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:00:04 -04:00
Ricardo Labiaga	6b96724e50	Revalidate caches on lock Instead of blindly zapping the caches, attempt to revalidate them if the server has indicated that it uses high resolution timestamps. NFSv4 should be able to always revalidate the cache since the protocol requires the update of the change attribute on modification of the data. In reality, there are servers (the Linux NFS server for example) that do not obey this requirement and use ctime as the basis for change attribute. Long term, the server needs to be fixed. At this time, and to be on the safe side, continue zapping caches if the server indicates that it does not have a high resolution timestamp. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 17:59:56 -04:00
Trond Myklebust	6f7a35bd23	NFSv4: Fix up the 'dircount' hint in encode_readdir Also ensure we only ask for either fileid or mounted_on_fileid in the readdirplus case too... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 13:20:25 -04:00
Trond Myklebust	9af8c222ca	NFSv4: Clean up nfs4_decode_dirent Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 13:18:04 -04:00
Trond Myklebust	4f082222fa	NFSv4: nfs4_decode_dirent must clear entry->fattr->valid Otherwise, we may end up reading uninitialised data from the resulting struct nfs_fattr. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 13:14:02 -04:00
Trond Myklebust	3201f3dd73	NFSv4: Fix a regression in decode_getfattr We don't want to have the mounted_on_fileid overwrite the true fileid. We only return the former if the server didn't supply the true fileid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:43:10 -04:00
Trond Myklebust	7ad0735300	NFSv4: Fix up decode_attr_filehandle() to handle the case of empty fh pointer decode_attr_filehandle still needs to skip the XDR-encoded filehandle if someone passes a null pointer argument. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:34:20 -04:00
Trond Myklebust	4a201d6e3f	NFS: Ensure we check all allocation return values in new readdir code Also some clean ups. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:38 -04:00
Bryan Schumaker	82f2e5472e	NFS: Readdir plus in v4 By requsting more attributes during a readdir, we can mimic the readdir plus operation that was in NFSv3. To test, I ran the command `ls -lU --color=none` on directories with various numbers of files. Without readdir plus, I see this: n files \| 100 \| 1,000 \| 10,000 \| 100,000 \| 1,000,000 --------+-----------+-----------+-----------+-----------+---------- real \| 0m00.153s \| 0m00.589s \| 0m05.601s \| 0m56.691s \| 9m59.128s user \| 0m00.007s \| 0m00.007s \| 0m00.077s \| 0m00.703s \| 0m06.800s sys \| 0m00.010s \| 0m00.070s \| 0m00.633s \| 0m06.423s \| 1m10.005s access \| 3 \| 1 \| 1 \| 4 \| 31 getattr \| 2 \| 1 \| 1 \| 1 \| 1 lookup \| 104 \| 1,003 \| 10,003 \| 100,003 \| 1,000,003 readdir \| 2 \| 16 \| 158 \| 1,575 \| 15,749 total \| 111 \| 1,021 \| 10,163 \| 101,583 \| 1,015,784 With readdir plus enabled, I see this: n files \| 100 \| 1,000 \| 10,000 \| 100,000 \| 1,000,000 --------+-----------+-----------+-----------+-----------+---------- real \| 0m00.115s \| 0m00.206s \| 0m01.079s \| 0m12.521s \| 2m07.528s user \| 0m00.003s \| 0m00.003s \| 0m00.040s \| 0m00.290s \| 0m03.296s sys \| 0m00.007s \| 0m00.020s \| 0m00.120s \| 0m01.357s \| 0m17.556s access \| 3 \| 1 \| 1 \| 1 \| 7 getattr \| 2 \| 1 \| 1 \| 1 \| 1 lookup \| 4 \| 3 \| 3 \| 3 \| 3 readdir \| 6 \| 62 \| 630 \| 6,300 \| 62,993 total \| 15 \| 67 \| 635 \| 6,305 \| 63,004 Readdir plus disabled has about a 16x increase in the number of rpc calls and is 4 - 5 times slower on large directories. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:37 -04:00
Bryan Schumaker	ae42c70a60	NFS: introduce generic decode_getattr function Getattr should be able to decode errors and the readdir file handle. decode_getfattr_attrs does the actual attribute decoding, while decode_getfattr_generic will check the opcode before decoding. This will let other functions call decode_getfattr_attrs to decode their attributes. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:37 -04:00
Bryan Schumaker	9942438089	NFS: check xdr_decode for errors Check if the decoded entry has the eof bit set when returning from xdr_decode with an error. If it does, we should set the eof bits in the array before returning. This should keep us from looping when we expect more data but the server doesn't give us anything new. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:36 -04:00
Bryan Schumaker	3c8a1aeed8	NFS: nfs_readdir_filler catch all errors Check for all errors, not a specific one. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:35 -04:00
Bryan Schumaker	56e4ebf877	NFS: readdir with vmapped pages We can use vmapped pages to read more information from the network at once. This will reduce the number of calls needed to complete a readdir. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> [trondmy: Added #include for linux/vmalloc.h> in fs/nfs/dir.c] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:35 -04:00
Bryan Schumaker	afa8ccc978	NFS: remove page size checking code Remove the page size checking code for a readdir decode. This is now done by decode_dirent with xdr_streams. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:34 -04:00
Bryan Schumaker	babddc72a9	NFS: decode_dirent should use an xdr_stream Convert nfs*xdr.c to use an xdr stream in decode_dirent. This will prevent a kernel oops that has been occuring when reading a vmapped page. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:33 -04:00
Bryan Schumaker	0715dc632a	NFS: remove readdir plus limit We will now use readdir plus even on directories that are very large. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:32 -04:00
Bryan Schumaker	d39ab9de3b	NFS: re-add readdir plus This patch adds readdir plus support to the cache array. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:31 -04:00
Trond Myklebust	baf57a09e9	NFS: Optimise the readdir searches If we're going through the loop in nfs_readdir() more than once, we usually do not want to restart searching from the beginning of the pages cache. We only want to do that if the previous search failed... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:30 -04:00
Bryan Schumaker	d1bacf9eb2	NFS: add readdir cache array This patch adds the readdir cache array and functions to retreive the array stored on a cache page, clear the array by freeing allocated memory, add an entry to the array, and search the array for a given cookie. It then modifies readdir to make use of the new cache array. With the new cache array method, we no longer need some of this code. Finally, nfs_llseek_dir() will set file->f_pos to a value greater than 0 and desc->dir_cookie to zero. When we see this, readdir needs to find the file at position file->f_pos from the start of the directory. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:30 -04:00
Randy Dunlap	8c7597f6ce	nfs: include ratelimit.h, fix nfs4state build error nfs4state.c uses interfaces from ratelimit.h. It needs to include that header file to fix build errors: fs/nfs/nfs4state.c:1195: warning: type defaults to 'int' in declaration of 'DEFINE_RATELIMIT_STATE' fs/nfs/nfs4state.c:1195: warning: parameter names (without types) in function declaration fs/nfs/nfs4state.c:1195: error: invalid storage class for function 'DEFINE_RATELIMIT_STATE' fs/nfs/nfs4state.c:1195: error: implicit declaration of function '__ratelimit' fs/nfs/nfs4state.c:1195: error: '_rs' undeclared (first use in this function) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:29 -04:00
Trond Myklebust	168667c43b	NFSv4: The state manager must ignore EKEYEXPIRED. Otherwise, we cannot recover state correctly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:28 -04:00
Trond Myklebust	898f635c42	NFSv4: Don't ignore the error return codes from nfs_intent_set_file If nfs_intent_set_file() returns an error, we usually want to pass that back up the stack. Also ensure that nfs_open_revalidate() returns '1' on success. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:17 -04:00
Linus Torvalds	79f14b7c56	Merge branch 'vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: (30 commits) BKL: remove BKL from freevxfs BKL: remove BKL from qnx4 autofs4: Only declare function when CONFIG_COMPAT is defined autofs: Only declare function when CONFIG_COMPAT is defined ncpfs: Lock socket in ncpfs while setting its callbacks fs/locks.c: prepare for BKL removal BKL: Remove BKL from ncpfs BKL: Remove BKL from OCFS2 BKL: Remove BKL from squashfs BKL: Remove BKL from jffs2 BKL: Remove BKL from ecryptfs BKL: Remove BKL from afs BKL: Remove BKL from USB gadgetfs BKL: Remove BKL from autofs4 BKL: Remove BKL from isofs BKL: Remove BKL from fat BKL: Remove BKL from ext2 filesystem BKL: Remove BKL from do_new_mount() BKL: Remove BKL from cgroup BKL: Remove BKL from NTFS ...	2010-10-22 10:52:01 -07:00
Linus Torvalds	5704e44d28	Merge branch 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: BKL: introduce CONFIG_BKL. dabusb: remove the BKL sunrpc: remove the big kernel lock init/main.c: remove BKL notations blktrace: remove the big kernel lock rtmutex-tester: make it build without BKL dvb-core: kill the big kernel lock dvb/bt8xx: kill the big kernel lock tlclk: remove big kernel lock fix rawctl compat ioctls breakage on amd64 and itanic uml: kill big kernel lock parisc: remove big kernel lock cris: autoconvert trivial BKL users alpha: kill big kernel lock isapnp: BKL removal s390/block: kill the big kernel lock hpet: kill BKL, add compat_ioctl	2010-10-22 10:43:11 -07:00
Arnd Bergmann	6de5bd128d	BKL: introduce CONFIG_BKL. With all the patches we have queued in the BKL removal tree, only a few dozen modules are left that actually rely on the BKL, and even there are lots of low-hanging fruit. We need to decide what to do about them, this patch illustrates one of the options: Every user of the BKL is marked as 'depends on BKL' in Kconfig, and the CONFIG_BKL becomes a user-visible option. If it gets disabled, no BKL using module can be built any more and the BKL code itself is compiled out. The one exception is file locking, which is practically always enabled and does a 'select BKL' instead. This effectively forces CONFIG_BKL to be enabled until we have solved the fs/lockd mess and can apply the patch that removes the BKL from fs/locks.c. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-21 15:44:13 +02:00
Trond Myklebust	6eaa61496f	NFSv4: Don't call nfs4_reclaim_complete() on receiving NFS4ERR_STALE_CLIENTID If the server sends us an NFS4ERR_STALE_CLIENTID while the state management thread is busy reclaiming state, we do want to treat all state that wasn't reclaimed before the STALE_CLIENTID as if a network partition occurred (see the edge conditions described in RFC3530 and RFC5661). What we do not want to do is to send an nfs4_reclaim_complete(), since we haven't yet even started reclaiming state after the server rebooted. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-10-19 19:42:53 -04:00
Trond Myklebust	ae1007d37e	NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error handlers In the case of a server reboot, the state recovery thread starts by calling nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when the server reboots while the client is in the middle of recovery. However, if the client has already marked the nfs4_state as requiring reboot recovery, then the above behaviour will cause the recovery thread to treat the open as if it was part of such an edge condition: the open will be recovered as if it was part of a lease expiration (and all the locks will be lost). Fix is to remove the call to nfs4_state_mark_reclaim_reboot from nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it to the recovery thread to do this for us. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-10-19 19:42:33 -04:00
Trond Myklebust	b0ed9dbc24	NFSv4: Fix open recovery NFSv4 open recovery is currently broken: since we do not clear the state->flags states before attempting recovery, we end up with the 'can_open_cached()' function triggering. This again leads to no OPEN call being put on the wire. Reported-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-10-19 19:41:55 -04:00
Trond Myklebust	bc4866b6e0	NFS: Don't SIGBUS if nfs_vm_page_mkwrite races with a cache invalidation In the case where we lock the page, and then find out that the page has been thrown out of the page cache, we should just return VM_FAULT_NOPAGE. This is what block_page_mkwrite() does in these situations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-10-19 19:37:54 -04:00
Bryan Schumaker	955a857e06	NFS: new idmapper This patch creates a new idmapper system that uses the request-key function to place a call into userspace to map user and group ids to names. The old idmapper was single threaded, which prevented more than one request from running at a single time. This means that a user would have to wait for an upcall to finish before accessing a cached result. The upcall result is stored on a keyring of type id_resolver. See the file Documentation/filesystems/nfs/idmapper.txt for instructions. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> [Trond: fix up the return value of nfs_idmap_lookup_name and clean up code] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-07 18:48:49 -04:00
Arnd Bergmann	b89f432133	fs/locks.c: prepare for BKL removal This prepares the removal of the big kernel lock from the file locking code. We still use the BKL as long as fs/lockd uses it and ceph might sleep, but we can flip the definition to a private spinlock as soon as that's done. All users outside of fs/lockd get converted to use lock_flocks() instead of lock_kernel() where appropriate. Based on an earlier patch to use a spinlock from Matthew Wilcox, who has attempted this a few times before, the earliest patch from over 10 years ago turned it into a semaphore, which ended up being slower than the BKL and was subsequently reverted. Someone should do some serious performance testing when this becomes a spinlock, since this has caused problems before. Using a spinlock should be at least as good as the BKL in theory, but who knows... Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Matthew Wilcox <willy@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org	2010-10-05 11:02:04 +02:00
Pavel Emelyanov	c653ce3f0a	sunrpc: Add net to rpc_create_args Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:18:56 -04:00
Pavel Emelyanov	fc5d00b04a	sunrpc: Add net argument to svc_create_xprt Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:18:54 -04:00
Trond Myklebust	aa510da5bf	NFS: We must use list_for_each_entry_safe in nfs_access_cache_shrinker We may end up removing the current entry from nfs_access_lru_list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-09-29 15:16:25 -04:00
Jeff Layton	a00dd6c03d	NFS: don't use FLUSH_SYNC on WB_SYNC_NONE COMMIT calls (try #2 ) WB_SYNC_NONE is supposed to mean "don't wait on anything". That should also include not waiting for COMMIT calls to complete. WB_SYNC_NONE is also implied when wbc->nonblocking and wbc->for_background are set, so we can replace those checks in nfs_commit_unstable_pages with a check for WB_SYNC_NONE. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-09-29 14:42:30 -04:00
Trond Myklebust	5c78f58e2d	NFS: Really fix put_nfs_open_context() In nfs_open_revalidate(), if the open_context() call returns an inode that is not the same as dentry->d_inode, then we will call put_nfs_open_context() with a valid dentry->d_inode, but without the context being part of the nfsi->open_files list. In this case too, we want to just skip the list removal, but we do want to call the ->close_context() callback in order to close the NFSv4 state. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Jeff Layton <jlayton@redhat.com>	2010-09-29 14:41:36 -04:00

... 2 3 4 5 6 ...

2004 Commits