Commit Graph

2069 Commits

Author SHA1 Message Date
J. Bruce Fields
5d6031ca74 nfsd4: zero op arguments beyond the 8th compound op
The first 8 ops of the compound are zeroed since they're a part of the
argument that's zeroed by the

	memset(rqstp->rq_argp, 0, procp->pc_argsize);

in svc_process_common().  But we handle larger compounds by allocating
the memory on the fly in nfsd4_decode_compound().  Other than code
recently fixed by 01529e3f81 "NFSD: Fix memory leak in encoding denied
lock", I don't know of any examples of code depending on this
initialization. But it definitely seems possible, and I'd rather be
safe.

Compounds this long are unusual so I'm much more worried about failure
in this poorly tested cases than about an insignificant performance hit.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-17 16:20:39 -04:00
Jeff Layton
ae4b884fc6 nfsd: silence sparse warning about accessing credentials
sparse says:

    fs/nfsd/auth.c:31:38: warning: incorrect type in argument 1 (different address spaces)
    fs/nfsd/auth.c:31:38:    expected struct cred const *cred
    fs/nfsd/auth.c:31:38:    got struct cred const [noderef] <asn:4>*real_cred

Add a new accessor for the ->real_cred and use that to fetch the
pointer. Accessing current->real_cred directly is actually quite safe
since we know that they can't go away so this is mostly a cosmetic fixup
to silence sparse.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-17 16:15:35 -04:00
Trond Myklebust
b0fc29d6fc nfsd: Ensure stateids remain unique until they are freed
Add an extra delegation state to allow the stateid to remain in the idr
tree until the last reference has been released. This will be necessary
to ensure uniqueness once the client_mutex is removed.

[jlayton: reset the sc_type under the state_lock in unhash_delegation]

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-16 21:39:51 -04:00
Jeff Layton
d564fbec7a nfsd: nfs4_alloc_init_lease should take a nfs4_file arg
No need to pass the delegation pointer in here as it's only used to get
the nfs4_file pointer.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-16 21:35:25 -04:00
Jeff Layton
02e1215f9f nfsd: Avoid taking state_lock while holding inode lock in nfsd_break_one_deleg
state_lock is a heavily contended global lock. We don't want to grab
that while simultaneously holding the inode->i_lock.

Add a new per-nfs4_file lock that we can use to protect the
per-nfs4_file delegation list. Hold that while walking the list in the
break_deleg callback and queue the workqueue job for each one.

The workqueue job can then take the state_lock and do the list
manipulations without the i_lock being held prior to starting the
rpc call.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-16 21:06:12 -04:00
Jeff Layton
e8051c837b nfsd: eliminate nfsd4_init_callback
It's just an obfuscated INIT_WORK call. Just make the work_func_t a
non-static symbol and use a normal INIT_WORK call.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-16 14:18:58 -04:00
Kinglong Mee
d5d5c304b1 NFSD: Fix bad checking of space for padding in splice read
Note that the caller has already reserved space for count and eof, so
xdr->p has already moved past them, only the padding remains.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes dc97618ddd (nfsd4: separate splice and readv cases)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 15:19:25 -04:00
Kinglong Mee
35e634b83c NFSD: Check acl returned from get_acl/posix_acl_from_mode
Commit 4ac7249ea5 (nfsd: use get_acl and ->set_acl)
don't check the acl returned from get_acl()/posix_acl_from_mode().

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 15:03:53 -04:00
Jeff Layton
a46cb7f287 nfsd: cleanup and rename nfs4_check_open
Rename it to better describe what it does, and have it just return the
stateid instead of a __be32 (which is now always nfs_ok). Also, do the
search for an existing stateid after the delegation check, to reduce
cleanup if the delegation check returns error.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:06:38 -04:00
Jeff Layton
baeb4ff0e5 nfsd: make deny mode enforcement more efficient and close races in it
The current enforcement of deny modes is both inefficient and scattered
across several places, which makes it hard to guarantee atomicity. The
inefficiency is a problem now, and the lack of atomicity will mean races
once the client_mutex is removed.

First, we address the inefficiency. We have to track deny modes on a
per-stateid basis to ensure that open downgrades are sane, but when the
server goes to enforce them it has to walk the entire list of stateids
and check against each one.

Instead of doing that, maintain a per-nfs4_file deny mode. When a file
is opened, we simply set any deny bits in that mode that were specified
in the OPEN call. We can then use that unified deny mode to do a simple
check to see whether there are any conflicts without needing to walk the
entire stateid list.

The only time we'll need to walk the entire list of stateids is when a
stateid that has a deny mode on it is being released, or one is having
its deny mode downgraded. In that case, we must walk the entire list and
recalculate the fi_share_deny field. Since deny modes are pretty rare
today, this should be very rare under normal workloads.

To address the potential for races once the client_mutex is removed,
protect fi_share_deny with the fi_lock. In nfs4_get_vfs_file, check to
make sure that any deny mode we want to apply won't conflict with
existing access. If that's ok, then have nfs4_file_get_access check that
new access to the file won't conflict with existing deny modes.

If that also passes, then get file access references, set the correct
access and deny bits in the stateid, and update the fi_share_deny field.
If opening the file or truncating it fails, then unwind the whole mess
and return the appropriate error.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:06:32 -04:00
Jeff Layton
7214e8600e nfsd: always hold the fi_lock when bumping fi_access refcounts
Once we remove the client_mutex, there's an unlikely but possible race
that could occur. It will be possible for nfs4_file_put_access to race
with nfs4_file_get_access. The refcount will go to zero (briefly) and
then bumped back to one. If that happens we set ourselves up for a
use-after-free and the potential for a lock to race onto the i_flock
list as a filp is being torn down.

Ensure that we can safely bump the refcount on the file by holding the
fi_lock whenever that's done. The only place it currently isn't is in
get_lock_access.

In order to ensure atomicity with finding the file, use the
find_*_file_locked variants and then call get_lock_access to get new
access references on the nfs4_file under the same lock.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:06:17 -04:00
Jeff Layton
3b84240a7b nfsd: clean up reset_union_bmap_deny
Fix the "deny" argument type, and start the loop at 1. The 0 iteration
is always a noop.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:06:11 -04:00
Jeff Layton
6eb3a1d096 nfsd: set stateid access and deny bits in nfs4_get_vfs_file
Cleanup -- ensure that the stateid bits are set at the same time that
the file access refcounts are incremented. Keeping them coherent like
this makes it easier to ensure that we account for all of the
references.

Since the initialization of the st_*_bmap fields is done when it's
hashed, we go ahead and hash the stateid before getting access to the
file and unhash it if that function returns error. This will be
necessary anyway in a follow-on patch that will overhaul deny mode
handling.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:06:05 -04:00
Jeff Layton
c11c591fe6 nfsd: shrink st_access_bmap and st_deny_bmap
We never use anything above bit #3, so an unsigned long for each is
wasteful. Shrink them to a char each, and add some WARN_ON_ONCE calls if
we try to set or clear bits that would go outside those sizes.

Note too that because atomic bitops work on unsigned longs, we have to
abandon their use here. That shouldn't be a problem though since we
don't really care about the atomicity in this code anyway. Using them
was just a convenient way to flip bits.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:06:04 -04:00
Jeff Layton
6d338b51eb nfsd: remove nfs4_file_put_fd
...and replace it with a simple swap call.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:05:57 -04:00
Jeff Layton
1265965172 nfsd: refactor nfs4_file_get_access and nfs4_file_put_access
Have them take NFS4_SHARE_ACCESS_* flags instead of an open mode. This
spares the callers from having to convert it themselves.

This also allows us to simplify these functions as we no longer need
to do the access_to_omode conversion in either one.

Note too that this patch eliminates the WARN_ON in
__nfs4_file_get_access. It's valid for now, but in a later patch we'll
be bumping the refcounts prior to opening the file in order to close
some races, at which point we'll need to remove it anyway.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-11 11:03:23 -04:00
Trond Myklebust
e20fcf1e65 nfsd: clean up helper __release_lock_stateid
Use filp_close instead of open coding. filp_close does a bit more than
just release the locks and put the filp. It also calls ->flush and
dnotify_flush, both of which should be done here anyway.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10 15:05:26 -04:00
Trond Myklebust
de18643dce nfsd: Add locking to the nfs4_file->fi_fds[] array
Preparation for removal of the client_mutex, which currently protects
this array. While we don't actually need the find_*_file_locked variants
just yet, a later patch will. So go ahead and add them now to reduce
future churn in this code.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10 15:05:26 -04:00
Trond Myklebust
1d31a2531a nfsd: Add fine grained protection for the nfs4_file->fi_stateids list
Access to this list is currently serialized by the client_mutex. Add
finer grained locking around this list in preparation for its removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10 15:05:25 -04:00
Jeff Layton
d6c249b4d4 nfsd: reduce some spinlocking in put_client_renew
No need to take the lock unless the count goes to 0.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10 13:41:00 -04:00
Jeff Layton
dff1399f8a nfsd: close potential race between delegation break and laundromat
Bruce says:

    There's also a preexisting expire_client/laundromat vs break race:

    - expire_client/laundromat adds a delegation to its local
      reaplist using the same dl_recall_lru field that a delegation
      uses to track its position on the recall lru and drops the
      state lock.

    - a concurrent break_lease adds the delegation to the lru.

    - expire/client/laundromat then walks it reaplist and sees the
      lru head as just another delegation on the list....

Fix this race by checking the dl_time under the state_lock. If we find
that it's not 0, then we know that it has already been queued to the LRU
list and that we shouldn't queue it again.

In the case of destroy_client, we must also ensure that we don't hit
similar races by ensuring that we don't move any delegations to the
reaplist with a dl_time of 0. Just bump the dl_time by one before we
drop the state_lock. We're destroying the delegations anyway, so a 1s
difference there won't matter.

The fault injection code also requires a bit of surgery here:

First, in the case of nfsd_forget_client_delegations, we must prevent
the same sort of race vs. the delegation break callback. For that, we
just increment the dl_time to ensure that a delegation callback can't
race in while we're working on it.

We can't do that for nfsd_recall_client_delegations, as we need to have
it actually queue the delegation, and that won't happen if we increment
the dl_time. The state lock is held over that function, so we don't need
to worry about these sorts of races there.

There is one other potential bug nfsd_recall_client_delegations though.
Entries on the victims list are not dequeued before calling
nfsd_break_one_deleg. That's a potential list corruptor, so ensure that
we do that there.

Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-10 13:40:51 -04:00
Kinglong Mee
01529e3f81 NFSD: Fix memory leak in encoding denied lock
Commit 8c7424cff6 (nfsd4: don't try to encode conflicting owner if low on space)
forgot free conf->data in nfsd4_encode_lockt and before sign conf->data to NULL
in nfsd4_encode_lock_denied.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:08 -04:00
Trond Myklebust
0fe492db60 nfsd: Convert nfs4_check_open_reclaim() to work with lookup_clientid()
lookup_clientid is preferable to find_confirmed_client since it's able
to use the cached client in the compound state.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:07 -04:00
Trond Myklebust
2d91e8953c nfsd: Always use lookup_clientid() in nfsd4_process_open1
In later patches, we'll be moving the stateowner table into the
nfs4_client, and by doing this we ensure that we have a cached
nfs4_client pointer.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:06 -04:00
Trond Myklebust
13d6f66b08 nfsd: Convert nfsd4_process_open1() to work with lookup_clientid()
...and have alloc_init_open_stateowner just use the cstate->clp pointer
instead of passing in a clp separately. This allows us to use the
cached nfs4_client pointer in the cstate instead of having to look it
up again.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:05 -04:00
Jeff Layton
4b24ca7d30 nfsd: Allow struct nfsd4_compound_state to cache the nfs4_client
We want to use the nfsd4_compound_state to cache the nfs4_client in
order to optimise away extra lookups of the clid.

In the v4.0 case, we use this to ensure that we only have to look up the
client at most once per compound for each call into lookup_clientid. For
v4.1+ we set the pointer in the cstate during SEQUENCE processing so we
should never need to do a search for it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:04 -04:00
Jeff Layton
62814d6a9b nfsd: add a nfserrno mapping for -E2BIG to nfserr_fbig
I saw this pop up with some pynfs testing:

    [  123.609992] nfsd: non-standard errno: -7

...and -7 is -E2BIG. I think what happened is that XFS returned -E2BIG
due to some xattr operations with the ACL10 pynfs TEST (I guess it has
limited xattr size?).

Add a better mapping for that error since it's possible that we'll need
it. How about we convert it to NFSERR_FBIG? As Bruce points out, they
both have "BIG" in the name so it must be good.

Also, turn the printk in this function into a WARN() so that we can get
a bit more information about situations that don't have proper mappings.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:03 -04:00
Jeff Layton
722b620d18 nfsd: properly convert return from commit_metadata to __be32
Commit 2a7420c03e504 (nfsd: Ensure that nfsd_create_setattr commits
files to stable storage), added a couple of calls to commit_metadata,
but doesn't convert their return codes to __be32 in the appropriate
places.

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:02 -04:00
Trond Myklebust
2dd6e458c3 nfsd: Cleanup - Let nfsd4_lookup_stateid() take a cstate argument
The cstate already holds information about the session, and hence
the client id, so it makes more sense to pass that information
rather than the current practice of passing a 'minor version' number.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:01 -04:00
Trond Myklebust
d4e19e7027 nfsd: Don't get a session reference without a client reference
If the client were to disappear from underneath us while we're holding
a session reference, things would be bad. This cleanup helps ensure
that it cannot, which will be a possibility when the client_mutex is
removed.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:55:00 -04:00
Jeff Layton
fd44907c2d nfsd: clean up nfsd4_release_lockowner
Now that we know that we won't have several lockowners with the same,
owner->data, we can simplify nfsd4_release_lockowner and get rid of
the lo_list in the process.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:54:59 -04:00
Trond Myklebust
b3c32bcd9c nfsd: NFSv4 lock-owners are not associated to a specific file
Just like open-owners, lock-owners are associated with a name, a clientid
and, in the case of minor version 0, a sequence id. There is no association
to a file.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:54:58 -04:00
Jeff Layton
c53530da4d nfsd: Allow lockowners to hold several stateids
A lockowner can have more than one lock stateid. For instance, if a
process has more than one file open and has locks on both, then the same
lockowner has more than one stateid associated with it. Change it so
that this reality is better reflected by the objects that nfsd uses.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-09 20:54:57 -04:00
Trond Myklebust
3c87b9b7c0 nfsd: lock owners are not per open stateid
In the NFSv4 spec, lock stateids are per-file objects. Lockowners are not.
This patch replaces the current list of lock owners in the open stateids
with a list of lock stateids.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:37 -04:00
Trond Myklebust
acf9295b1c nfsd: clean up nfsd4_close_open_stateid
Minor cleanup that should introduce no behavioral changes.

Currently this function just unhashes the stateid and leaves the caller
to do the work of the CLOSE processing.

Change nfsd4_close_open_stateid so that it handles doing all of the work
of closing a stateid. Move the handling of the unhashed stateid into it
instead of doing that work in nfsd4_close. This will help isolate some
coming changes to stateid handling from nfsd4_close.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:36 -04:00
Jeff Layton
db24b3b4b2 nfsd: declare v4.1+ openowners confirmed on creation
There's no need to confirm an openowner in v4.1 and above, so we can
go ahead and set NFS4_OO_CONFIRMED when we create openowners in
those versions. This will also be necessary when we remove the
client_mutex, as it'll be possible for two concurrent opens to race
in versions >4.0.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:35 -04:00
Trond Myklebust
b607664ee7 nfsd: Cleanup nfs4svc_encode_compoundres
Move the slot return, put session etc into a helper in fs/nfsd/nfs4state.c
instead of open coding in nfs4svc_encode_compoundres.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:34 -04:00
Trond Myklebust
e17f99b728 nfsd: nfs4_preprocess_seqid_op should only set *stpp on success
Not technically a bugfix, since nothing tries to use the return pointer
if this function doesn't return success, but it could be a problem
with some coming changes.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:33 -04:00
Jeff Layton
5b8db00bae nfsd: add a new /proc/fs/nfsd/max_connections file
Currently, the maximum number of connections that nfsd will allow
is based on the number of threads spawned. While this is fine for a
default, there really isn't a clear relationship between the two.

The number of threads corresponds to the number of concurrent requests
that we want to allow the server to process at any given time. The
connection limit corresponds to the maximum number of clients that we
want to allow the server to handle. These are two entirely different
quantities.

Break the dependency on increasing threads in order to allow for more
connections, by adding a new per-net parameter that can be set to a
non-zero value. The default is still to base it on the number of threads,
so there should be no behavior change for anyone who doesn't use it.

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:32 -04:00
Trond Myklebust
0f3a24b43b nfsd: Ensure that nfsd_create_setattr commits files to stable storage
Since nfsd_create_setattr strips the mode from the struct iattr, it
is quite possible that it will optimise away the call to nfsd_setattr
altogether.
If this is the case, then we never call commit_metadata() on the
newly created file.

Also ensure that both nfsd_setattr() and nfsd_create_setattr() fail
when the call to commit_metadata fails.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:31 -04:00
Kinglong Mee
1e444f5bc0 NFSD: Remove iattr parameter from nfsd_symlink()
Commit db2e747b14 (vfs: remove mode parameter from vfs_symlink())
have remove mode parameter from vfs_symlink.
So that, iattr isn't needed by nfsd_symlink now, just remove it.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:31 -04:00
Trond Myklebust
950e0118d0 nfsd: Protect addition to the file_hashtbl
Current code depends on the client_mutex to guarantee a single struct
nfs4_file per inode in the file_hashtbl and make addition atomic with
respect to lookup.  Rely instead on the state_Lock, to make it easier to
stop taking the client_mutex here later.

To prevent an i_lock/state_lock inversion, change nfsd4_init_file to
use ihold instead if igrab. That's also more efficient anyway as we
definitely hold a reference to the inode at that point.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:30 -04:00
Christoph Hellwig
7e6a72e5f1 nfsd: fix file access refcount leak when nfsd4_truncate fails
nfsd4_process_open2 will currently will get access to the file, and then
call nfsd4_truncate to (possibly) truncate it. If that operation fails
though, then the access references will never be released as the
nfs4_ol_stateid is never initialized.

Fix by moving the nfsd4_truncate call into nfs4_get_vfs_file, ensuring
that the refcounts are properly put if the truncate fails.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:29 -04:00
Kinglong Mee
1055414fe1 NFSD: Avoid warning message when compile at i686 arch
fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_readv':
>> fs/nfsd/nfs4xdr.c:3137:148: warning: comparison of distinct pointer types lacks a cast [enabled by default]
thislen = min(len, ((void *)xdr->end - (void *)xdr->p));

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:28 -04:00
J. Bruce Fields
d5e2338324 nfsd4: replace defer_free by svcxdr_tmpalloc
Avoid an extra allocation for the tmpbuf struct itself, and stop
ignoring some allocation failures.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:27 -04:00
J. Bruce Fields
bcaab953b1 nfsd4: remove nfs4_acl_new
This is a not-that-useful kmalloc wrapper.  And I'd like one of the
callers to actually use something other than kmalloc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:27 -04:00
J. Bruce Fields
29c353b3fe nfsd4: define svcxdr_dupstr to share some common code
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:26 -04:00
J. Bruce Fields
ce043ac826 nfsd4: remove unused defer_free argument
28e05dd845 "knfsd: nfsd4: represent nfsv4 acl with array instead of
linked list" removed the last user that wanted a custom free function.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:25 -04:00
J. Bruce Fields
7fb84306f5 nfsd4: rename cr_linkname->cr_data
The name of a link is currently stored in cr_name and cr_namelen, and
the content in cr_linkname and cr_linklen.  That's confusing.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:24 -04:00
J. Bruce Fields
52ee04330f nfsd: let nfsd_symlink assume null-terminated data
Currently nfsd_symlink has a weird hack to serve callers who don't
null-terminate symlink data: it looks ahead at the next byte to see if
it's zero, and copies it to a new buffer to null-terminate if not.

That means callers don't have to null-terminate, but they *do* have to
ensure that the byte following the end of the data is theirs to read.

That's a bit subtle, and the NFSv4 code actually got this wrong.

So let's just throw out that code and let callers pass null-terminated
strings; we've already fixed them to do that.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:23 -04:00
J. Bruce Fields
0aeae33f5d nfsd: make NFSv2 null terminate symlink data
It's simple enough for NFSv2 to null-terminate the symlink data.

A bit weird (it depends on knowing that we've already read the following
byte, which is either padding or part of the mode), but no worse than
the conditional kstrdup it otherwise relies on in nfsd_symlink().

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:23 -04:00
J. Bruce Fields
b829e9197a nfsd: fix rare symlink decoding bug
An NFS operation that creates a new symlink includes the symlink data,
which is xdr-encoded as a length followed by the data plus 0 to 3 bytes
of zero-padding as required to reach a 4-byte boundary.

The vfs, on the other hand, wants null-terminated data.

The simple way to handle this would be by copying the data into a newly
allocated buffer with space for the final null.

The current nfsd_symlink code tries to be more clever by skipping that
step in the (likely) case where the byte following the string is already
0.

But that assumes that the byte following the string is ours to look at.
In fact, it might be the first byte of a page that we can't read, or of
some object that another task might modify.

Worse, the NFSv4 code tries to fix the problem by actually writing to
that byte.

In the NFSv2/v3 cases this actually appears to be safe:

	- nfs3svc_decode_symlinkargs explicitly null-terminates the data
	  (after first checking its length and copying it to a new
	  page).
	- NFSv2 limits symlinks to 1k.  The buffer holding the rpc
	  request is always at least a page, and the link data (and
	  previous fields) have maximum lengths that prevent the request
	  from reaching the end of a page.

In the NFSv4 case the CREATE op is potentially just one part of a long
compound so can end up on the end of a page if you're unlucky.

The minimal fix here is to copy and null-terminate in the NFSv4 case.
The nfsd_symlink() interface here seems too fragile, though.  It should
really either do the copy itself every time or just require a
null-terminated string.

Reported-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-08 17:14:22 -04:00
Kinglong Mee
c3a4561796 nfsd: Fix bad reserving space for encoding rdattr_error
Introduced by commit 561f0ed498 (nfsd4: allow large readdirs).

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-07 14:16:31 -04:00
Avi Kivity
69bbd9c7b9 nfs: fix nfs4d readlink truncated packet
XDR requires 4-byte alignment; nfs4d READLINK reply writes out the padding,
but truncates the packet to the padding-less size.

Fix by taking the padding into consideration when truncating the packet.

Symptoms:

	# ll /mnt/
	ls: cannot read symbolic link /mnt/test: Input/output error
	total 4
	-rw-r--r--. 1 root root  0 Jun 14 01:21 123456
	lrwxrwxrwx. 1 root root  6 Jul  2 03:33 test
	drwxr-xr-x. 1 root root  0 Jul  2 23:50 tmp
	drwxr-xr-x. 1 root root 60 Jul  2 23:44 tree

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Fixes: 476a7b1f4b (nfsd4: don't treat readlink like a zero-copy operation)
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-07-02 17:37:13 -04:00
J. Bruce Fields
76f47128f9 nfsd: fix rare symlink decoding bug
An NFS operation that creates a new symlink includes the symlink data,
which is xdr-encoded as a length followed by the data plus 0 to 3 bytes
of zero-padding as required to reach a 4-byte boundary.

The vfs, on the other hand, wants null-terminated data.

The simple way to handle this would be by copying the data into a newly
allocated buffer with space for the final null.

The current nfsd_symlink code tries to be more clever by skipping that
step in the (likely) case where the byte following the string is already
0.

But that assumes that the byte following the string is ours to look at.
In fact, it might be the first byte of a page that we can't read, or of
some object that another task might modify.

Worse, the NFSv4 code tries to fix the problem by actually writing to
that byte.

In the NFSv2/v3 cases this actually appears to be safe:

	- nfs3svc_decode_symlinkargs explicitly null-terminates the data
	  (after first checking its length and copying it to a new
	  page).
	- NFSv2 limits symlinks to 1k.  The buffer holding the rpc
	  request is always at least a page, and the link data (and
	  previous fields) have maximum lengths that prevent the request
	  from reaching the end of a page.

In the NFSv4 case the CREATE op is potentially just one part of a long
compound so can end up on the end of a page if you're unlucky.

The minimal fix here is to copy and null-terminate in the NFSv4 case.
The nfsd_symlink() interface here seems too fragile, though.  It should
really either do the copy itself every time or just require a
null-terminated string.

Reported-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-27 16:10:46 -04:00
Jeff Layton
d4c8e34fe8 nfsd: properly handle embedded newlines in fault_injection input
Currently rpc_pton() fails to handle the case where you echo an address
into the file, as it barfs on the newline. Ensure that we NULL out the
first occurrence of any newline.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-23 11:31:38 -04:00
Jeff Layton
f7ce5d2842 nfsd: fix return of nfs4_acl_write_who
AFAICT, the only way to hit this error is to pass this function a bogus
"who" value. In that case, we probably don't want to return -1 as that
could get sent back to the client. Turn this into nfserr_serverfault,
which is a more appropriate error for a server bug like this.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-23 11:31:38 -04:00
Jeff Layton
94ec938b61 nfsd: add appropriate __force directives to filehandle generation code
The filehandle structs all use host-endian values, but will sometimes
stuff big-endian values into those fields. This is OK since these
values are opaque to the client, but it confuses sparse. Add __force to
make it clear that we are doing this intentionally.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-23 11:31:37 -04:00
Jeff Layton
e2afc81919 nfsd: nfsd_splice_read and nfsd_readv should return __be32
The callers expect a __be32 return and the functions they call return
__be32, so having these return int is just wrong. Also, nfsd_finish_read
can be made static.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-23 11:31:37 -04:00
Jeff Layton
b3d8d1284a nfsd: clean up sparse endianness warnings in nfscache.c
We currently hash the XID to determine a hash bucket to use for the
reply cache entry, which is fed into hash_32 without byte-swapping it.
Add __force to make sparse happy, and add some comments to explain
why.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-23 11:31:37 -04:00
Jeff Layton
f419992c1f nfsd: add __force to opaque verifier field casts
sparse complains that we're stuffing non-byte-swapped values into
__be32's here. Since they're supposed to be opaque, it doesn't matter
much. Just add __force to make sparse happy.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-23 11:31:37 -04:00
Kinglong Mee
bf18f163e8 NFSD: Using exp_get for export getting
Don't using cache_get besides export.h, using exp_get for export.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-23 11:31:36 -04:00
Kinglong Mee
0da22a919d NFSD: Using path_get when assigning path for export
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-23 11:31:36 -04:00
Kinglong Mee
f15a5cf912 SUNRPC/NFSD: Change to type of bool for rq_usedeferral and rq_splice_ok
rq_usedeferral and rq_splice_ok are used as 0 and 1, just defined to bool.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-23 11:31:36 -04:00
Kinglong Mee
3c7aa15d20 NFSD: Using min/max/min_t/max_t for calculate
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-23 11:31:36 -04:00
Kinglong Mee
f41c5ad2ff NFSD: fix bug for readdir of pseudofs
Commit 561f0ed498 (nfsd4: allow large readdirs) introduces a bug
about readdir the root of pseudofs.

Call xdr_truncate_encode() revert encoded name when skipping.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-17 16:42:48 -04:00
NeilBrown
6282cd5655 NFSD: Don't hand out delegations for 30 seconds after recalling them.
If nfsd needs to recall a delegation for some reason it implies that there is
contention on the file, so further delegations should not be handed out.

The current code fails to do so, and the result is effectively a
live-lock under some workloads: a client attempting a conflicting
operation on a read-delegated file receives NFS4ERR_DELAY and retries
the operation, but by the time it retries the server may already have
given out another delegation.

We could simply avoid delegations for (say) 30 seconds after any recall, but
this is probably too heavy handed.

We could keep a list of inodes (or inode numbers or filehandles) for recalled
delegations, but that requires memory allocation and searching.

The approach taken here is to use a bloom filter to record the filehandles
which are currently blocked from delegation, and to accept the cost of a few
false positives.

We have 2 bloom filters, each of which is valid for 30 seconds.   When a
delegation is recalled the filehandle is added to one filter and will remain
disabled for between 30 and 60 seconds.

We keep a count of the number of filehandles that have been added, so when
that count is zero we can bypass all other tests.

The bloom filters have 256 bits and 3 hash functions.  This should allow a
couple of dozen blocked  filehandles with minimal false positives.  If many
more filehandles are all blocked at once, behaviour will degrade towards
rejecting all delegations for between 30 and 60 seconds, then resetting and
allowing new delegations.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-17 16:42:47 -04:00
J. Bruce Fields
48385408b4 nfsd4: fix FREE_STATEID lockowner leak
27b11428b7 ("nfsd4: remove lockowner when removing lock stateid")
introduced a memory leak.

Cc: stable@vger.kernel.org
Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-09 17:13:54 -04:00
Jeff Layton
1b19453d1c nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entry
Currently, the DRC cache pruner will stop scanning the list when it
hits an entry that is RC_INPROG. It's possible however for a call to
take a *very* long time. In that case, we don't want it to block other
entries from being pruned if they are expired or we need to trim the
cache to get back under the limit.

Fix the DRC cache pruner to just ignore RC_INPROG entries.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-06 19:22:49 -04:00
J. Bruce Fields
542d1ab3c7 nfsd4: kill READ64
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-06 19:22:48 -04:00
J. Bruce Fields
06553991e7 nfsd4: kill READ32
While we're here, let's kill off a couple of the read-side macros.

Leaving the more complicated ones alone for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-06 19:22:47 -04:00
J. Bruce Fields
05638dc73a nfsd4: simplify server xdr->next_page use
The rpc code makes available to the NFS server an array of pages to
encod into.  The server represents its reply as an xdr buf, with the
head pointing into the first page in that array, the pages ** array
starting just after that, and the tail (if any) sharing any leftover
space in the page used by the head.

While encoding, we use xdr_stream->page_ptr to keep track of which page
we're currently using.

Currently we set xdr_stream->page_ptr to buf->pages, which makes the
head a weird exception to the rule that page_ptr always points to the
page we're currently encoding into.  So, instead set it to buf->pages -
1 (the page actually containing the head), and remove the need for a
little unintuitive logic in xdr_get_next_encode_buffer() and
xdr_truncate_encode.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-06 19:22:46 -04:00
Benny Halevy
3fb87d13ce nfsd4: hash deleg stateid only on successful nfs4_set_delegation
We don't want the stateid to be found in the hash table before the delegation
is granted.

Currently this is protected by the client_mutex, but we want to break that
up and this is a necessary step toward that goal.

Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-04 15:42:04 -04:00
Benny Halevy
cdc9750500 nfsd4: rename recall_lock to state_lock
...as the name is a bit more descriptive and we've started using it for
other purposes.

Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-04 15:42:04 -04:00
Jeff Layton
7025005d5e nfsd: remove unneeded zeroing of fields in nfsd4_proc_compound
The memset of resp in svc_process_common should ensure that these are
already zeroed by the time they get here.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-04 15:42:03 -04:00
Jeff Layton
ba5378b66f nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_open
In the NFS4_OPEN_CLAIM_PREVIOUS case, we should only mark it confirmed
if the nfs4_check_open_reclaim check succeeds.

In the NFS4_OPEN_CLAIM_DELEG_PREV_FH and NFS4_OPEN_CLAIM_DELEGATE_PREV
cases, I see no point in declaring the openowner confirmed when the
operation is going to fail anyway, and doing so might allow the client
to game things such that it wouldn't need to confirm a subsequent open
with the same owner.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-04 15:42:02 -04:00
Benny Halevy
931ee56c67 nfsd4: use recall_lock for delegation hashing
This fixes a bug in the handling of the fi_delegations list.

nfs4_setlease does not hold the recall_lock when adding to it. The
client_mutex is held, which prevents against concurrent list changes,
but nfsd_break_deleg_cb does not hold while walking it. New delegations
could theoretically creep onto the list while we're walking it there.

Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-04 15:41:52 -04:00
Jeff Layton
a832e7ae8b nfsd: fix laundromat next-run-time calculation
The laundromat uses two variables to calculate when it should next run,
but one is completely ignored at the end of the run. Merge the two and
rename the variable to be more descriptive of what it does.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 20:25:28 -04:00
Jeff Layton
da2ebce6a0 nfsd: make nfsd4_encode_fattr static
sparse says:

      CHECK   fs/nfsd/nfs4xdr.c
    fs/nfsd/nfs4xdr.c:2043:1: warning: symbol 'nfsd4_encode_fattr' was not declared. Should it be static?

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 20:25:28 -04:00
Kinglong Mee
a48fd0f9f7 SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING
When debugging, rpc prints messages from dprintk(KERN_WARNING ...)
with "^A4" prefixed,

[ 2780.339988] ^A4nfsd: connect from unprivileged port: 127.0.0.1, port=35316

Trond tells,
> dprintk != printk. We have NEVER supported dprintk(KERN_WARNING...)

This patch removes using of dprintk with KERN_WARNING.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 20:25:28 -04:00
Christoph Hellwig
b52bd7bccc nfsd: remove unused function nfsd_read_file
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:27 -04:00
Christoph Hellwig
12337901d6 nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer
Note nobody's ever noticed because the typical client probably never
requests FILES_AVAIL without also requesting something else on the list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:26 -04:00
Kinglong Mee
be69da8052 NFSD: Error out when getting more than one fsloc/secinfo/uuid
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:25 -04:00
Kinglong Mee
1f53146da9 NFSD: Using type of uint32_t for ex_nflavors instead of int
ex_nflavors can't be negative number, just defined by uint32_t.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:24 -04:00
Kinglong Mee
f0db79d54b NFSD: Add missing comment of "expiry" in expkey_parse()
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:24 -04:00
Kinglong Mee
e6d615f742 NFSD: Remove typedef of svc_client and svc_export in export.c
No need for a typedef wrapper for svc_export or svc_client, remove them.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:23 -04:00
Kinglong Mee
a30ae94c07 NFSD: Cleanup unneeded including net/ipv6.h
Commit 49b28684fd ("nfsd: Remove deprecated nfsctl system call and
related code") removed the only use of ipv6_addr_set_v4mapped(), so
net/ipv6.h is unneeded now.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:22 -04:00
Kinglong Mee
61a27f08a6 NFSD: Cleanup unused variable in nfsd_setuser()
Commit 8f6c5ffc89 ("kernel/groups.c: remove return value of
set_groups") removed the last use of "ret".

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:21 -04:00
Kinglong Mee
0faed901c6 NFSD: remove unneeded linux/user_namespace.h include
After commit 4c1e1b34d5 ("nfsd: Store ex_anon_uid and ex_anon_gid as
kuids and kgids") using kuid/kgid for ex_anon_uid/ex_anon_gid,
user_namespace.h is not needed.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:20 -04:00
Kinglong Mee
94eb36892d NFSD: Adds macro EX_UUID_LEN for exports uuid's length
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:19 -04:00
Kinglong Mee
0d63790c36 NFSD: Helper function for parsing uuid
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:19 -04:00
Kinglong Mee
a1f05514b0 NFS4: Avoid NULL reference or double free in nfsd4_fslocs_free()
If fsloc_parse() failed at kzalloc(), fs/nfsd/export.c
 411
 412         fsloc->locations = kzalloc(fsloc->locations_count
 413                         * sizeof(struct nfsd4_fs_location), GFP_KERNEL);
 414         if (!fsloc->locations)
 415                 return -ENOMEM;

svc_export_parse() will call nfsd4_fslocs_free() with fsloc->locations = NULL,
so that, "kfree(fsloc->locations[i].path);" will cause a crash.

If fsloc_parse() failed after that, fsloc_parse() will call nfsd4_fslocs_free(),
and svc_export_parse() will call it again, so that, a double free is caused.

This patch checks the fsloc->locations, and set to NULL after it be freed.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:18 -04:00
J. Bruce Fields
a5cddc885b nfsd4: better reservation of head space for krb5
RPC_MAX_AUTH_SIZE is scattered around several places.  Better to set it
once in the auth code, where this kind of estimate should be made.  And
while we're at it we can leave it zero when we're not using krb5i or
krb5p.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:17 -04:00
J. Bruce Fields
d05d5744ef nfsd4: kill write32, write64
And switch a couple other functions from the encode(&p,...) convention
to the p = encode(p,...) convention mostly used elsewhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:16 -04:00
J. Bruce Fields
0c0c267ba9 nfsd4: kill WRITEMEM
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:15 -04:00
J. Bruce Fields
b64c7f3bdf nfsd4: kill WRITE64
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:14 -04:00
J. Bruce Fields
c373b0a428 nfsd4: kill WRITE32
These macros just obscure what's going on.  Adopt the convention of the
client-side code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:13 -04:00
J. Bruce Fields
c8f13d9775 nfsd4: really fix nfs4err_resource in 4.1 case
encode_getattr, for example, can return nfserr_resource to indicate it
ran out of buffer space.  That's not a legal error in the 4.1 case.
And in the 4.1 case, if we ran out of buffer space, we should have
exceeded a session limit too.

(Note in 1bc49d83c3 "nfsd4: fix
nfs4err_resource in 4.1 case" we originally tried fixing this error
return before fixing the problem that we could error out while we still
had lots of available space.  The result was to trade one illegal error
for another in those cases.  We decided that was helpful, so reverted
the change in fc208d026b, and are only
reinstating it now that we've elimited almost all of those cases.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:13 -04:00
J. Bruce Fields
b042098063 nfsd4: allow exotic read compounds
I'm not sure why a client would want to stuff multiple reads in a
single compound rpc, but it's legal for them to do it, and we should
really support it.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:12 -04:00
J. Bruce Fields
fec25fa4ad nfsd4: more read encoding cleanup
More cleanup, no change in functionality.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:11 -04:00
J. Bruce Fields
34a78b488f nfsd4: read encoding cleanup
Trivial cleanup, no change in functionality.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:10 -04:00
J. Bruce Fields
dc97618ddd nfsd4: separate splice and readv cases
The splice and readv cases are actually quite different--for example the
former case ignores the array of vectors we build up for the latter.

It is probably clearer to separate the two cases entirely.

There's some code duplication between the split out encoders, but this
is only temporary and will be fixed by a later patch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:09 -04:00
J. Bruce Fields
02fe470774 nfsd4: nfsd_vfs_read doesn't use file handle parameter
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:09 -04:00
J. Bruce Fields
b0e35fda82 nfsd4: turn off zero-copy-read in exotic cases
We currently allow only one read per compound, with operations before
and after whose responses will require no more than about a page to
encode.

While we don't expect clients to violate those limits any time soon,
this limitation isn't really condoned by the spec, so to future proof
the server we should lift the limitation.

At the same time we'd like to continue to support zero-copy reads.

Supporting multiple zero-copy-reads per compound would require a new
data structure to replace struct xdr_buf, which can represent only one
set of included pages.

So for now we plan to modify encode_read() to support either zero-copy
or non-zero-copy reads, and use some heuristics at the start of the
compound processing to decide whether a zero-copy read will work.

This will allow us to support more exotic compounds without introducing
a performance regression in the normal case.

Later patches handle those "exotic compounds", this one just makes sure
zero-copy is turned off in those cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:08 -04:00
J. Bruce Fields
ccae70a9ee nfsd4: estimate sequence response size
Otherwise a following patch would turn off all 4.1 zero-copy reads.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:07 -04:00
J. Bruce Fields
b86cef60da nfsd4: better estimate of getattr response size
We plan to use this estimate to decide whether or not to allow zero-copy
reads.  Currently we're assuming all getattr's are a page, which can be
both too small (ACLs e.g. may be arbitrarily long) and too large (after
an upcoming read patch this will unnecessarily prevent zero copy reads
in any read compound also containing a getattr).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:06 -04:00
J. Bruce Fields
476a7b1f4b nfsd4: don't treat readlink like a zero-copy operation
There's no advantage to this zero-copy-style readlink encoding, and it
unnecessarily limits the kinds of compounds we can handle.  (In practice
I can't see why a client would want e.g. multiple readlink calls in a
comound, but it's probably a spec violation for us not to handle it.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:05 -04:00
J. Bruce Fields
3b29970909 nfsd4: enforce rd_dircount
As long as we're here, let's enforce the protocol's limit on the number
of directory entries to return in a readdir.

I don't think anyone's ever noticed our lack of enforcement, but maybe
there's more of a chance they will now that we allow larger readdirs.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:04 -04:00
J. Bruce Fields
561f0ed498 nfsd4: allow large readdirs
Currently we limit readdir results to a single page.  This can result in
a performance regression compared to NFSv3 when reading large
directories.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:03 -04:00
J. Bruce Fields
32aaa62ede nfsd4: use session limits to release send buffer reservation
Once we know the limits the session places on the size of the rpc, we
can also use that information to release any unnecessary reserved reply
buffer space.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:02 -04:00
J. Bruce Fields
47ee529864 nfsd4: adjust buflen to session channel limit
We can simplify session limit enforcement by restricting the xdr buflen
to the session size.

Also fix a preexisting bug: we should really have been taking into
account the auth-required space when comparing against session limits,
which are limits on the size of the entire rpc reply, including any krb5
overhead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:02 -04:00
J. Bruce Fields
30596768b3 nfsd4: fix buflen calculation after read encoding
We don't necessarily want to assume that the buflen is the same
as the number of bytes available in the pages.  We may have some reason
to set it to something less (for example, later patches will use a
smaller buflen to enforce session limits).

So, calculate the buflen relative to the previous buflen instead of
recalculating it from scratch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:00 -04:00
J. Bruce Fields
89ff884ebb nfsd4: nfsd4_check_resp_size should check against whole buffer
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:59 -04:00
J. Bruce Fields
6ff9897d2b nfsd4: minor encode_read cleanup
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:58 -04:00
J. Bruce Fields
4f0cefbf38 nfsd4: more precise nfsd4_max_reply
It will turn out to be useful to have a more accurate estimate of reply
size; so, piggyback on the existing op reply-size estimators.

Also move nfsd4_max_reply to nfs4proc.c to get easier access to struct
nfsd4_operation and friends.  (Thanks to Christoph Hellwig for pointing
out that simplification.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:57 -04:00
J. Bruce Fields
8c7424cff6 nfsd4: don't try to encode conflicting owner if low on space
I ran into this corner case in testing: in theory clients can provide
state owners up to 1024 bytes long.  In the sessions case there might be
a risk of this pushing us over the DRC slot size.

The conflicting owner isn't really that important, so let's humor a
client that provides a small maxresponsize_cached by allowing ourselves
to return without the conflicting owner instead of outright failing the
operation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:55 -04:00
J. Bruce Fields
f5236013a2 nfsd4: convert 4.1 replay encoding
Limits on maxresp_sz mean that we only ever need to replay rpc's that
are contained entirely in the head.

The one exception is very small zero-copy reads.  That's an odd corner
case as clients wouldn't normally ask those to be cached.

in any case, this seems a little more robust.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:55 -04:00
J. Bruce Fields
2825a7f907 nfsd4: allow encoding across page boundaries
After this we can handle for example getattr of very large ACLs.

Read, readdir, readlink are still special cases with their own limits.

Also we can't handle a new operation starting close to the end of a
page.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:54 -04:00
J. Bruce Fields
a8095f7e80 nfsd4: size-checking cleanup
Better variable name, some comments, etc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:53 -04:00
J. Bruce Fields
ea8d7720b2 nfsd4: remove redundant encode buffer size checking
Now that all op encoders can handle running out of space, we no longer
need to check the remaining size for every operation; only nonidempotent
operations need that check, and that can be done by
nfsd4_check_resp_size.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:52 -04:00
J. Bruce Fields
67492c9903 nfsd4: nfsd4_check_resp_size needn't recalculate length
We're keeping the length updated as we go now, so there's no need for
the extra calculation here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:51 -04:00
J. Bruce Fields
4e21ac4b6f nfsd4: reserve space before inlining 0-copy pages
Once we've included page-cache pages in the encoding it's difficult to
remove them and restart encoding.  (xdr_truncate_encode doesn't handle
that case.)  So, make sure we'll have adequate space to finish the
operation first.

For now COMPOUND_SLACK_SPACE checks should prevent this case happening,
but we want to remove those checks.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:50 -04:00
J. Bruce Fields
d0a381dd0e nfsd4: teach encoders to handle reserve_space failures
We've tried to prevent running out of space with COMPOUND_SLACK_SPACE
and special checking in those operations (getattr) whose result can vary
enormously.

However:
	- COMPOUND_SLACK_SPACE may be difficult to maintain as we add
	  more protocol.
	- BUG_ON or page faulting on failure seems overly fragile.
	- Especially in the 4.1 case, we prefer not to fail compounds
	  just because the returned result came *close* to session
	  limits.  (Though perfect enforcement here may be difficult.)
	- I'd prefer encoding to be uniform for all encoders instead of
	  having special exceptions for encoders containing, for
	  example, attributes.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:49 -04:00
J. Bruce Fields
082d4bd72a nfsd4: "backfill" using write_bytes_to_xdr_buf
Normally xdr encoding proceeds in a single pass from start of a buffer
to end, but sometimes we have to write a few bytes to an earlier
position.

Use write_bytes_to_xdr_buf for these cases rather than saving a pointer
to write to.  We plan to rewrite xdr_reserve_space to handle encoding
across page boundaries using a scratch buffer, and don't want to risk
writing to a pointer that was contained in a scratch buffer.

Also it will no longer be safe to calculate lengths by subtracting two
pointers, so use xdr_buf offsets instead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:49 -04:00
J. Bruce Fields
1fcea5b20b nfsd4: use xdr_truncate_encode
Now that lengths are reliable, we can use xdr_truncate instead of
open-coding it everywhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:48 -04:00
J. Bruce Fields
6ac90391c6 nfsd4: keep xdr buf length updated
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:38 -04:00
J. Bruce Fields
dd97fddedc nfsd4: no need for encode_compoundres to adjust lengths
xdr_reserve_space should now be calculating the length correctly as we
go, so there's no longer any need to fix it up here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:37 -04:00
J. Bruce Fields
f46d382a74 nfsd4: remove ADJUST_ARGS
It's just uninteresting debugging code at this point.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:36 -04:00
J. Bruce Fields
d3f627c815 nfsd4: use xdr_stream throughout compound encoding
Note this makes ADJUST_ARGS useless; we'll remove it in the following
patch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:35 -04:00
J. Bruce Fields
ddd1ea5636 nfsd4: use xdr_reserve_space in attribute encoding
This is a cosmetic change for now; no change in behavior.

Note we're just depending on xdr_reserve_space to do the bounds checking
for us, we're not really depending on its adjustment of iovec or xdr_buf
lengths yet, as those are fixed up by as necessary after the fact by
read-link operations and by nfs4svc_encode_compoundres.  However we do
have to update xdr->iov on read-like operations to prevent
xdr_reserve_space from messing with the already-fixed-up length of the
the head.

When the attribute encoding fails partway through we have to undo the
length adjustments made so far.  We do it manually for now, but later
patches will add an xdr_truncate_encode() helper to handle cases like
this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-28 14:52:34 -04:00
J. Bruce Fields
5f4ab94587 nfsd4: allow space for final error return
This post-encoding check should be taking into account the need to
encode at least an out-of-space error to the following op (if any).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-27 11:09:09 -04:00
J. Bruce Fields
07d1f80207 nfsd4: fix encoding of out-of-space replies
If nfsd4_check_resp_size() returns an error then we should really be
truncating the reply here, otherwise we may leave extra garbage at the
end of the rpc reply.

Also add a warning to catch any cases where our reply-size estimates may
be wrong in the case of a non-idempotent operation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-27 11:09:08 -04:00
J. Bruce Fields
1802a67894 nfsd4: reserve head space for krb5 integ/priv info
Currently if the nfs-level part of a reply would be too large, we'll
return an error to the client.  But if the nfs-level part fits and
leaves no room for krb5p or krb5i stuff, then we just drop the request
entirely.

That's no good.  Instead, reserve some slack space at the end of the
buffer and make sure we fail outright if we'd come close.

The slack space here is a massive overstimate of what's required, we
should probably try for a tighter limit at some point.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:47 -04:00
J. Bruce Fields
2d124dfaad nfsd4: move proc_compound xdr encode init to helper
Mechanical transformation with no change of behavior.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:46 -04:00
J. Bruce Fields
d518465866 nfsd4: tweak nfsd4_encode_getattr to take xdr_stream
Just change the nfsd4_encode_getattr api.  Not changing any code or
adding any new functionality yet.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:46 -04:00
J. Bruce Fields
4aea24b2ff nfsd4: embed xdr_stream in nfsd4_compoundres
This is a mechanical transformation with no change in behavior.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:45 -04:00
J. Bruce Fields
e372ba60de nfsd4: decoding errors can still be cached and require space
Currently a non-idempotent op reply may be cached if it fails in the
proc code but not if it fails at xdr decoding.  I doubt there are any
xdr-decoding-time errors that would make this a problem in practice, so
this probably isn't a serious bug.

The space estimates should also take into account space required for
encoding of error returns.  Again, not a practical problem, though it
would become one after future patches which will tighten the space
estimates.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:44 -04:00
J. Bruce Fields
f34e432b67 nfsd4: fix write reply size estimate
The write reply also includes count and stable_how.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:43 -04:00
J. Bruce Fields
622f560e6a nfsd4: read size estimate should include padding
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:42 -04:00
J. Bruce Fields
24906f3237 nfsd4: allow larger 4.1 session drc slots
The client is actually asking for 2532 bytes.  I suspect that's a
mistake.  But maybe we can allow some more.  In theory lock needs more
if it might return a maximum-length lockowner in the denied case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:41 -04:00
J. Bruce Fields
5b648699af nfsd4: READ, READDIR, etc., are idempotent
OP_MODIFIES_SOMETHING flags operations that we should be careful not to
initiate without being sure we have the buffer space to encode a reply.

None of these ops fall into that category.

We could probably remove a few more, but this isn't a very important
problem at least for ops whose reply size is easy to estimate.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-23 09:03:41 -04:00
NeilBrown
8658452e4a nfsd: Only set PF_LESS_THROTTLE when really needed.
PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks
and live-locks while writing to the page cache in a loop-back
NFS mount situation.

It therefore makes sense to *only* set PF_LESS_THROTTLE in this
situation.
We now know when a request came from the local-host so it could be a
loop-back mount.  We already know when we are handling write requests,
and when we are doing anything else.

So combine those two to allow nfsd to still be throttled (like any
other process) in every situation except when it is known to be
problematic.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-22 15:59:19 -04:00
Christoph Hellwig
abf1135b6e nfsd: remove nfsd4_free_slab
No need for a kmem_cache_destroy wrapper in nfsd, just do proper
goto based unwinding.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-22 15:52:57 -04:00
Benoit Taine
d40aa3372f nfsd: Remove assignments inside conditions
Assignments should not happen inside an if conditional, but in the line
before. This issue was reported by checkpatch.

The semantic patch that makes this change is as follows
(http://coccinelle.lip6.fr/):

// <smpl>

@@
identifier i1;
expression e1;
statement S;
@@
-if(!(i1 = e1)) S
+i1 = e1;
+if(!i1)
+S

// </smpl>

It has been tested by compilation.

Signed-off-by: Benoit Taine <benoit.taine@lip6.fr>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-22 15:52:23 -04:00
J. Bruce Fields
f35ea0d4b6 Merge 3.15 bugfixes for 3.16 2014-05-22 15:48:15 -04:00
J. Bruce Fields
cbf7a75bc5 nfsd4: fix delegation cleanup on error
We're not cleaning up everything we need to on error.  In particular,
we're not removing our lease.  Among other problems this can cause the
struct nfs4_file used as fl_owner to be referenced after it has been
destroyed.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-21 12:17:17 -04:00
Kinglong Mee
368fe39b50 NFSD: Don't clear SUID/SGID after root writing data
We're clearing the SUID/SGID bits on write by hand in nfsd_vfs_write,
even though the subsequent vfs_writev() call will end up doing this for
us (through file system write methods eventually calling
file_remove_suid(), e.g., from __generic_file_aio_write).

So, remove the redundant nfsd code.

The only change in behavior is when the write is by root, in which case
we previously cleared SUID/SGID, but will now leave it alone.  The new
behavior is the behavior of every filesystem we've checked.

It seems better to be consistent with local filesystem behavior.  And
the security advantage seems limited as root could always restore these
bits by hand if it wanted.

SUID/SGID is not cleared after writing data with (root, local ext4),
   File: ‘test’
   Size: 0               Blocks: 0          IO Block: 4096   regular
empty file
Device: 803h/2051d      Inode: 1200137     Links: 1
Access: (4777/-rwsrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:admin_home_t:s0
Access: 2014-04-18 21:36:31.016029014 +0800
Modify: 2014-04-18 21:36:31.016029014 +0800
Change: 2014-04-18 21:36:31.026030285 +0800
  Birth: -
   File: ‘test’
   Size: 5               Blocks: 8          IO Block: 4096   regular file
Device: 803h/2051d      Inode: 1200137     Links: 1
Access: (4777/-rwsrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:admin_home_t:s0
Access: 2014-04-18 21:36:31.016029014 +0800
Modify: 2014-04-18 21:36:31.040032065 +0800
Change: 2014-04-18 21:36:31.040032065 +0800
  Birth: -

With no_root_squash, (root, remote ext4), SUID/SGID are cleared,
   File: ‘test’
   Size: 0               Blocks: 0          IO Block: 262144 regular
empty file
Device: 24h/36d Inode: 786439      Links: 1
Access: (4777/-rwsrwxrwx)  Uid: ( 1000/    test)   Gid: ( 1000/    test)
Context: system_u:object_r:nfs_t:s0
Access: 2014-04-18 21:45:32.155805097 +0800
Modify: 2014-04-18 21:45:32.155805097 +0800
Change: 2014-04-18 21:45:32.168806749 +0800
  Birth: -
   File: ‘test’
   Size: 5               Blocks: 8          IO Block: 262144 regular file
Device: 24h/36d Inode: 786439      Links: 1
Access: (0777/-rwxrwxrwx)  Uid: ( 1000/    test)   Gid: ( 1000/    test)
Context: system_u:object_r:nfs_t:s0
Access: 2014-04-18 21:45:32.155805097 +0800
Modify: 2014-04-18 21:45:32.184808783 +0800
Change: 2014-04-18 21:45:32.184808783 +0800
  Birth: -

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-21 12:17:16 -04:00
J. Bruce Fields
27b11428b7 nfsd4: warn on finding lockowner without stateid's
The current code assumes a one-to-one lockowner<->lock stateid
correspondance.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-21 11:11:21 -04:00
J. Bruce Fields
a1b8ff4c97 nfsd4: remove lockowner when removing lock stateid
The nfsv4 state code has always assumed a one-to-one correspondance
between lock stateid's and lockowners even if it appears not to in some
places.

We may actually change that, but for now when FREE_STATEID releases a
lock stateid it also needs to release the parent lockowner.

Symptoms were a subsequent LOCK crashing in find_lockowner_str when it
calls same_lockowner_ino on a lockowner that unexpectedly has an empty
so_stateids list.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-21 11:11:21 -04:00
J. Bruce Fields
5513a510fa nfsd4: fix corruption on setting an ACL.
As of 06f9cc12ca "nfsd4: don't create
unnecessary mask acl", any non-trivial ACL will be left with an
unitialized entry, and a trivial ACL may write one entry beyond what's
allocated.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-15 15:36:04 -04:00