linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-05 03:21:32 +00:00

Author	SHA1	Message	Date
Kinglong Mee	a1420384e3	NFSD: Put exports after nfsd4_layout_verify fail Fix commit `9cf514ccfa` (nfsd: implement pNFS operations). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-03-20 16:15:42 -04:00
Christoph Hellwig	31ef83dc05	nfsd: add trace events For now just a few simple events to trace the layout stateid lifetime, but these already were enough to find several bugs in the Linux client layout stateid handling. Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-02-02 18:09:44 +01:00
Christoph Hellwig	c5c707f96f	nfsd: implement pNFS layout recalls Add support to issue layout recalls to clients. For now we only support full-file recalls to get a simple and stable implementation. This allows to embedd a nfsd4_callback structure in the layout_state and thus avoid any memory allocations under spinlocks during a recall. For normal use cases that do not intent to share a single file between multiple clients this implementation is fully sufficient. To ensure layouts are recalled on local filesystem access each layout state registers a new FL_LAYOUT lease with the kernel file locking code, which filesystems that support pNFS exports that require recalls need to break on conflicting access patterns. The XDR code is based on the old pNFS server implementation by Andy Adamson, Benny Halevy, Boaz Harrosh, Dean Hildebrand, Fred Isaman, Marc Eshel, Mike Sager and Ricardo Labiaga. Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-02-02 18:09:43 +01:00
Christoph Hellwig	9cf514ccfa	nfsd: implement pNFS operations Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage outstanding layouts and devices. Layout management is very straight forward, with a nfs4_layout_stateid structure that extends nfs4_stid to manage layout stateids as the top-level structure. It is linked into the nfs4_file and nfs4_client structures like the other stateids, and contains a linked list of layouts that hang of the stateid. The actual layout operations are implemented in layout drivers that are not part of this commit, but will be added later. The worst part of this commit is the management of the pNFS device IDs, which suffers from a specification that is not sanely implementable due to the fact that the device-IDs are global and not bound to an export, and have a small enough size so that we can't store the fsid portion of a file handle, and must never be reused. As we still do need perform all export authentication and validation checks on a device ID passed to GETDEVICEINFO we are caught between a rock and a hard place. To work around this issue we add a new hash that maps from a 64-bit integer to a fsid so that we can look up the export to authenticate against it, a 32-bit integer as a generation that we can bump when changing the device, and a currently unused 32-bit integer that could be used in the future to handle more than a single device per export. Entries in this hash table are never deleted as we can't reuse the ids anyway, and would have a severe lifetime problem anyway as Linux export structures are temporary structures that can go away under load. Parts of the XDR data, structures and marshaling/unmarshaling code, as well as many concepts are derived from the old pNFS server implementation from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman, Mike Sager, Ricardo Labiaga and many others. Signed-off-by: Christoph Hellwig <hch@lst.de>	2015-02-02 18:09:42 +01:00
Jeff Layton	779fb0f3af	sunrpc: move rq_splice_ok flag into rq_flags Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-12-09 11:22:21 -05:00
Jeff Layton	30660e04b0	sunrpc: move rq_usedeferral flag to rq_flags Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-12-09 11:22:20 -05:00
Anna Schumaker	b0cb908523	nfsd: Add DEALLOCATE support DEALLOCATE only returns a status value, meaning we can use the noop() xdr encoder to reply to the client. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-11-07 16:20:15 -05:00
Anna Schumaker	95d871f03c	nfsd: Add ALLOCATE support The ALLOCATE operation is used to preallocate space in a file. I can do this by using vfs_fallocate() to do the actual preallocation. ALLOCATE only returns a status indicator, so we don't need to write a special encode() function. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-11-07 16:19:49 -05:00
J. Bruce Fields	51904b0807	nfsd4: fix crash on unknown operation number Unknown operation numbers are caught in nfsd4_decode_compound() which sets op->opnum to OP_ILLEGAL and op->status to nfserr_op_illegal. The error causes the main loop in nfsd4_proc_compound() to skip most processing. But nfsd4_proc_compound also peeks ahead at the next operation in one case and doesn't take similar precautions there. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-10-23 13:39:51 -04:00
J. Bruce Fields	d1d84c9626	nfsd4: fix response size estimation for OP_SEQUENCE We added this new estimator function but forgot to hook it up. The effect is that NFSv4.1 (and greater) won't do zero-copy reads. The estimate was also wrong by 8 bytes. Fixes: `ccae70a9ee` "nfsd4: estimate sequence response size" Cc: stable@vger.kernel.org Reported-by: Chuck Lever <chucklever@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-10-21 09:10:50 -04:00
Anna Schumaker	24bab49122	NFSD: Implement SEEK This patch adds server support for the NFS v4.2 operation SEEK, which returns the position of the next hole or data segment in a file. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-29 14:35:20 -04:00
Trond Myklebust	3234975f47	nfsd: Remove nfs4_lock_state(): nfsd4_open and nfsd4_open_confirm Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-08-05 10:55:16 -04:00
Jeff Layton	58fb12e6a4	nfsd: Add a mutex to protect the NFSv4.0 open owner replay cache We don't want to rely on the client_mutex for protection in the case of NFSv4 open owners. Instead, we add a mutex that will only be taken for NFSv4.0 state mutating operations, and that will be released once the entire compound is done. Also, ensure that nfsd4_cstate_assign_replay/nfsd4_cstate_clear_replay take a reference to the stateowner when they are using it for NFSv4.0 open and lock replay caching. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-31 14:20:19 -04:00
Jeff Layton	b3fbfe0e7a	nfsd: print status when nfsd4_open fails to open file it just created It's possible for nfsd to fail opening a file that it has just created. When that happens, we throw a WARN but it doesn't include any info about the error code. Print the status code to give us a bit more info. Our QA group hit some of these warnings under some very heavy stress testing. My suspicion is that they hit the file-max limit, but it's hard to know for sure. Go ahead and add a -ENFILE mapping to nfserr_serverfault to make the error more distinct (and correct). Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-29 23:08:38 -04:00
Trond Myklebust	0fe492db60	nfsd: Convert nfs4_check_open_reclaim() to work with lookup_clientid() lookup_clientid is preferable to find_confirmed_client since it's able to use the cached client in the compound state. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-09 20:55:07 -04:00
Kinglong Mee	1e444f5bc0	NFSD: Remove iattr parameter from nfsd_symlink() Commit `db2e747b14` (vfs: remove mode parameter from vfs_symlink()) have remove mode parameter from vfs_symlink. So that, iattr isn't needed by nfsd_symlink now, just remove it. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:31 -04:00
J. Bruce Fields	7fb84306f5	nfsd4: rename cr_linkname->cr_data The name of a link is currently stored in cr_name and cr_namelen, and the content in cr_linkname and cr_linklen. That's confusing. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:24 -04:00
J. Bruce Fields	52ee04330f	nfsd: let nfsd_symlink assume null-terminated data Currently nfsd_symlink has a weird hack to serve callers who don't null-terminate symlink data: it looks ahead at the next byte to see if it's zero, and copies it to a new buffer to null-terminate if not. That means callers don't have to null-terminate, but they do have to ensure that the byte following the end of the data is theirs to read. That's a bit subtle, and the NFSv4 code actually got this wrong. So let's just throw out that code and let callers pass null-terminated strings; we've already fixed them to do that. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:23 -04:00
J. Bruce Fields	b829e9197a	nfsd: fix rare symlink decoding bug An NFS operation that creates a new symlink includes the symlink data, which is xdr-encoded as a length followed by the data plus 0 to 3 bytes of zero-padding as required to reach a 4-byte boundary. The vfs, on the other hand, wants null-terminated data. The simple way to handle this would be by copying the data into a newly allocated buffer with space for the final null. The current nfsd_symlink code tries to be more clever by skipping that step in the (likely) case where the byte following the string is already 0. But that assumes that the byte following the string is ours to look at. In fact, it might be the first byte of a page that we can't read, or of some object that another task might modify. Worse, the NFSv4 code tries to fix the problem by actually writing to that byte. In the NFSv2/v3 cases this actually appears to be safe: - nfs3svc_decode_symlinkargs explicitly null-terminates the data (after first checking its length and copying it to a new page). - NFSv2 limits symlinks to 1k. The buffer holding the rpc request is always at least a page, and the link data (and previous fields) have maximum lengths that prevent the request from reaching the end of a page. In the NFSv4 case the CREATE op is potentially just one part of a long compound so can end up on the end of a page if you're unlucky. The minimal fix here is to copy and null-terminate in the NFSv4 case. The nfsd_symlink() interface here seems too fragile, though. It should really either do the copy itself every time or just require a null-terminated string. Reported-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-07-08 17:14:22 -04:00
Jeff Layton	f419992c1f	nfsd: add __force to opaque verifier field casts sparse complains that we're stuffing non-byte-swapped values into __be32's here. Since they're supposed to be opaque, it doesn't matter much. Just add __force to make sparse happy. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:37 -04:00
Kinglong Mee	bf18f163e8	NFSD: Using exp_get for export getting Don't using cache_get besides export.h, using exp_get for export. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:36 -04:00
Kinglong Mee	f15a5cf912	SUNRPC/NFSD: Change to type of bool for rq_usedeferral and rq_splice_ok rq_usedeferral and rq_splice_ok are used as 0 and 1, just defined to bool. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:36 -04:00
Kinglong Mee	3c7aa15d20	NFSD: Using min/max/min_t/max_t for calculate Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-23 11:31:36 -04:00
J. Bruce Fields	05638dc73a	nfsd4: simplify server xdr->next_page use The rpc code makes available to the NFS server an array of pages to encod into. The server represents its reply as an xdr buf, with the head pointing into the first page in that array, the pages ** array starting just after that, and the tail (if any) sharing any leftover space in the page used by the head. While encoding, we use xdr_stream->page_ptr to keep track of which page we're currently using. Currently we set xdr_stream->page_ptr to buf->pages, which makes the head a weird exception to the rule that page_ptr always points to the page we're currently encoding into. So, instead set it to buf->pages - 1 (the page actually containing the head), and remove the need for a little unintuitive logic in xdr_get_next_encode_buffer() and xdr_truncate_encode. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-06 19:22:46 -04:00
Jeff Layton	7025005d5e	nfsd: remove unneeded zeroing of fields in nfsd4_proc_compound The memset of resp in svc_process_common should ensure that these are already zeroed by the time they get here. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-04 15:42:03 -04:00
Jeff Layton	ba5378b66f	nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_open In the NFS4_OPEN_CLAIM_PREVIOUS case, we should only mark it confirmed if the nfs4_check_open_reclaim check succeeds. In the NFS4_OPEN_CLAIM_DELEG_PREV_FH and NFS4_OPEN_CLAIM_DELEGATE_PREV cases, I see no point in declaring the openowner confirmed when the operation is going to fail anyway, and doing so might allow the client to game things such that it wouldn't need to confirm a subsequent open with the same owner. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-06-04 15:42:02 -04:00
J. Bruce Fields	a5cddc885b	nfsd4: better reservation of head space for krb5 RPC_MAX_AUTH_SIZE is scattered around several places. Better to set it once in the auth code, where this kind of estimate should be made. And while we're at it we can leave it zero when we're not using krb5i or krb5p. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:17 -04:00
J. Bruce Fields	ccae70a9ee	nfsd4: estimate sequence response size Otherwise a following patch would turn off all 4.1 zero-copy reads. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:07 -04:00
J. Bruce Fields	b86cef60da	nfsd4: better estimate of getattr response size We plan to use this estimate to decide whether or not to allow zero-copy reads. Currently we're assuming all getattr's are a page, which can be both too small (ACLs e.g. may be arbitrarily long) and too large (after an upcoming read patch this will unnecessarily prevent zero copy reads in any read compound also containing a getattr). Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:06 -04:00
J. Bruce Fields	561f0ed498	nfsd4: allow large readdirs Currently we limit readdir results to a single page. This can result in a performance regression compared to NFSv3 when reading large directories. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:32:03 -04:00
J. Bruce Fields	4f0cefbf38	nfsd4: more precise nfsd4_max_reply It will turn out to be useful to have a more accurate estimate of reply size; so, piggyback on the existing op reply-size estimators. Also move nfsd4_max_reply to nfs4proc.c to get easier access to struct nfsd4_operation and friends. (Thanks to Christoph Hellwig for pointing out that simplification.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:57 -04:00
J. Bruce Fields	8c7424cff6	nfsd4: don't try to encode conflicting owner if low on space I ran into this corner case in testing: in theory clients can provide state owners up to 1024 bytes long. In the sessions case there might be a risk of this pushing us over the DRC slot size. The conflicting owner isn't really that important, so let's humor a client that provides a small maxresponsize_cached by allowing ourselves to return without the conflicting owner instead of outright failing the operation. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:55 -04:00
J. Bruce Fields	2825a7f907	nfsd4: allow encoding across page boundaries After this we can handle for example getattr of very large ACLs. Read, readdir, readlink are still special cases with their own limits. Also we can't handle a new operation starting close to the end of a page. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:54 -04:00
J. Bruce Fields	a8095f7e80	nfsd4: size-checking cleanup Better variable name, some comments, etc. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:53 -04:00
J. Bruce Fields	ea8d7720b2	nfsd4: remove redundant encode buffer size checking Now that all op encoders can handle running out of space, we no longer need to check the remaining size for every operation; only nonidempotent operations need that check, and that can be done by nfsd4_check_resp_size. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:52 -04:00
J. Bruce Fields	d0a381dd0e	nfsd4: teach encoders to handle reserve_space failures We've tried to prevent running out of space with COMPOUND_SLACK_SPACE and special checking in those operations (getattr) whose result can vary enormously. However: - COMPOUND_SLACK_SPACE may be difficult to maintain as we add more protocol. - BUG_ON or page faulting on failure seems overly fragile. - Especially in the 4.1 case, we prefer not to fail compounds just because the returned result came close to session limits. (Though perfect enforcement here may be difficult.) - I'd prefer encoding to be uniform for all encoders instead of having special exceptions for encoders containing, for example, attributes. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-30 17:31:49 -04:00
J. Bruce Fields	6ac90391c6	nfsd4: keep xdr buf length updated Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-28 14:52:38 -04:00
J. Bruce Fields	d3f627c815	nfsd4: use xdr_stream throughout compound encoding Note this makes ADJUST_ARGS useless; we'll remove it in the following patch. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-28 14:52:35 -04:00
J. Bruce Fields	ddd1ea5636	nfsd4: use xdr_reserve_space in attribute encoding This is a cosmetic change for now; no change in behavior. Note we're just depending on xdr_reserve_space to do the bounds checking for us, we're not really depending on its adjustment of iovec or xdr_buf lengths yet, as those are fixed up by as necessary after the fact by read-link operations and by nfs4svc_encode_compoundres. However we do have to update xdr->iov on read-like operations to prevent xdr_reserve_space from messing with the already-fixed-up length of the the head. When the attribute encoding fails partway through we have to undo the length adjustments made so far. We do it manually for now, but later patches will add an xdr_truncate_encode() helper to handle cases like this. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-28 14:52:34 -04:00
J. Bruce Fields	07d1f80207	nfsd4: fix encoding of out-of-space replies If nfsd4_check_resp_size() returns an error then we should really be truncating the reply here, otherwise we may leave extra garbage at the end of the rpc reply. Also add a warning to catch any cases where our reply-size estimates may be wrong in the case of a non-idempotent operation. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-27 11:09:08 -04:00
J. Bruce Fields	1802a67894	nfsd4: reserve head space for krb5 integ/priv info Currently if the nfs-level part of a reply would be too large, we'll return an error to the client. But if the nfs-level part fits and leaves no room for krb5p or krb5i stuff, then we just drop the request entirely. That's no good. Instead, reserve some slack space at the end of the buffer and make sure we fail outright if we'd come close. The slack space here is a massive overstimate of what's required, we should probably try for a tighter limit at some point. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-23 09:03:47 -04:00
J. Bruce Fields	2d124dfaad	nfsd4: move proc_compound xdr encode init to helper Mechanical transformation with no change of behavior. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-23 09:03:46 -04:00
J. Bruce Fields	d518465866	nfsd4: tweak nfsd4_encode_getattr to take xdr_stream Just change the nfsd4_encode_getattr api. Not changing any code or adding any new functionality yet. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-23 09:03:46 -04:00
J. Bruce Fields	4aea24b2ff	nfsd4: embed xdr_stream in nfsd4_compoundres This is a mechanical transformation with no change in behavior. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-23 09:03:45 -04:00
J. Bruce Fields	e372ba60de	nfsd4: decoding errors can still be cached and require space Currently a non-idempotent op reply may be cached if it fails in the proc code but not if it fails at xdr decoding. I doubt there are any xdr-decoding-time errors that would make this a problem in practice, so this probably isn't a serious bug. The space estimates should also take into account space required for encoding of error returns. Again, not a practical problem, though it would become one after future patches which will tighten the space estimates. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-23 09:03:44 -04:00
J. Bruce Fields	f34e432b67	nfsd4: fix write reply size estimate The write reply also includes count and stable_how. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-23 09:03:43 -04:00
J. Bruce Fields	622f560e6a	nfsd4: read size estimate should include padding Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-23 09:03:42 -04:00
J. Bruce Fields	5b648699af	nfsd4: READ, READDIR, etc., are idempotent OP_MODIFIES_SOMETHING flags operations that we should be careful not to initiate without being sure we have the buffer space to encode a reply. None of these ops fall into that category. We could probably remove a few more, but this isn't a very important problem at least for ops whose reply size is easy to estimate. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-23 09:03:41 -04:00
Trond Myklebust	14bcab1a39	NFSd: Clean up nfs4_preprocess_stateid_op Move the state locking and file descriptor reference out from the callers and into nfs4_preprocess_stateid_op() itself. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-05-07 11:05:48 -04:00
Kinglong Mee	2336745e87	NFSD: Clear wcc data between compound ops Testing NFS4.0 by pynfs, I got some messeages as, "nfsd: inode locked twice during operation." When one compound RPC contains two or more ops that locks the filehandle,the second op will cause the message. As two SETATTR ops, after the first SETATTR, nfsd will not call fh_put() to release current filehandle, it means filehandle have unlocked with fh_post_saved = 1. The second SETATTR find fh_post_saved = 1, and printk the message. v2: introduce helper fh_clear_wcc(). Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-03-30 10:47:34 -04:00

1 2 3 4 5

224 Commits