Commit Graph

5103 Commits

Author SHA1 Message Date
Trond Myklebust
977294c719 NFSv4: Fix a compiler warning when CONFIG_NFS_V4_1 is undefined
Fix a compiler warning:
fs/nfs/nfs4proc.c:910:13: warning: 'nfs4_layoutget_release' defined but not used [-Wunused-function]
 static void nfs4_layoutget_release(void *calldata)
             ^~~~~~~~~~~~~~~~~~~~~~

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-05 10:38:39 -04:00
Trond Myklebust
3f0b3cf46e NFS: Filter cache invalidation when holding a delegation
If the client holds a delegation, then ensure we filter out attempts
to invalidate the size, owner, group owner, or mode unless we made the
change, in which case, check that NFS_INO_REVAL_FORCED is set by the
caller.
Always filter out attempts to invalidate the change attribute and
size, since we are authoritative for those.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:58 -04:00
Trond Myklebust
4ebe83af20 NFS: Ignore NFS_INO_REVAL_FORCED in nfs_check_inode_attributes()
If we hold a delegation, we should not need to call
nfs_check_inode_attributes() since we already know which attributes
are valid, and which ones may still need revalidation. The state
of the NFS_INO_REVAL_FORCED flag is therefore irrelevant.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:58 -04:00
Trond Myklebust
c80d17c55d NFS: Improve caching while holding a delegation
Make sure that the client completely ignores change attribute and size
changes on the server when it holds a delegation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:58 -04:00
Trond Myklebust
0b467264d0 NFS: Fix attribute revalidation
Don't mark attributes as invalid just because they have changed. Instead,
for the purposes of adjusting the attribute cache timeout, keep a
separate variable that tracks whether or not a change occurred.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:58 -04:00
Trond Myklebust
6a97d02dfe NFS: fix up nfs_setattr_update_inode
Always try to set the attributes, even if we don't have a valid struct
nfs_fattr.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:58 -04:00
Trond Myklebust
97c2c17af9 NFSv4: Ensure the inode is clean when we set a delegation
If there are attributes that are still invalid when we set a delegation,
then we need to set the NFS_INO_REVAL_FORCED flag.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:58 -04:00
Trond Myklebust
7c6726546c NFSv4: Ignore NFS_INO_REVAL_FORCED in nfs4_proc_access
If we hold a delegation, we don't need to care about whether or not
the inode attributes are up to date. We know we can cache the results
of this call regardless.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:20 -04:00
Trond Myklebust
2f28dc385a NFSv4: Don't ask for delegated attributes when adding a hard link
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 12:07:07 -04:00
Trond Myklebust
771734f291 NFSv4: Don't ask for delegated attributes when revalidating the inode
Again, when revalidating the inode, we don't need to ask for attributes
for which we are authoritative.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 12:07:07 -04:00
Trond Myklebust
a841b54dbd NFS: Pass the inode down to the getattr() callback
Allow the getattr() callback to check things like whether or not we hold
a delegation so that it can adjust the attributes that it is asking for.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 12:07:07 -04:00
Trond Myklebust
30846df06f NFSv4: Don't request size+change attribute if they are delegated to us
When we hold a delegation, we should not need to request attributes such
as the file size or the change attribute. For some servers, avoiding
asking for these unneeded attributes can improve the overall system
performance.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 12:07:07 -04:00
Trond Myklebust
ae55e59da0 pnfs: Don't release the sequence slot until we've processed layoutget on open
If the server recalls the layout that was just handed out, we risk hitting
a race as described in RFC5661 Section 2.10.6.3 unless we ensure that we
release the sequence slot after processing the LAYOUTGET operation that
was sent as part of the OPEN compound.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:12 -04:00
Trond Myklebust
32f1c28f3d pnfs: Don't call commit on failed layoutget-on-open
If the layoutget on open call failed, we can't really commit the inode,
so don't bother calling it.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:12 -04:00
Trond Myklebust
64294b08f9 pNFS: Don't send LAYOUTGET on OPEN for read, if we already have cached data
If we're only opening the file for reading, and the file is empty and/or
we already have cached data, then heuristically optimise away the
LAYOUTGET.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:12 -04:00
Trond Myklebust
8dc96566c0 NFSv4/pnfs: Don't switch off layoutget-on-open for transient errors
Ensure that we only switch off the LAYOUTGET operation in the OPEN
compound when the server is truly broken, and/or it is complaining
that the compound is too large.
Currently, we end up turning off the functionality permanently,
even for transient errors such as EACCES or ENOSPC.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Trond Myklebust
d49e0d5b99 NFSv4/pnfs: Ensure pnfs_parse_lgopen() won't try to parse uninitialised data
We need to ensure that pnfs_parse_lgopen() doesn't try to parse a
struct nfs4_layoutget_res that was not filled by a successful call
to decode_layoutget(). This can happen if we performed a cached open,
or if either the OP_ACCESS or OP_GETATTR operations preceding the
OP_LAYOUTGET in the compound returned an error.

By initialising the 'status' field to NFS4ERR_DELAY, we ensure that
pnfs_parse_lgopen() won't try to interpret the structure.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
30ae2412e9 pnfs: Fix manipulation of NFS_LAYOUT_FIRST_LAYOUTGET
The flag was not always being cleared after LAYOUTGET on OPEN.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
c49b5209f9 pnfs: Add barrier to prevent lgopen using LAYOUTGET during recall
Since the LAYOUTGET on OPEN can be sent without prior inode information,
existing methods to prevent LAYOUTGET from being sent while processing
CB_LAYOUTRECALL don't work.  Track if a recall occurred while LAYOUTGET
was being sent, and if so ignore the results.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
6e01260cee pnfs: Stop attempting LAYOUTGET on OPEN on failure
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
78746a384c pnfs: Add LAYOUTGET to OPEN of an existing file
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Trond Myklebust
29a8bfe52d pNFS: Refactor nfs4_layoutget_release()
Move the actual freeing of the struct nfs4_layoutget into fs/nfs/pnfs.c
where it can be reused by the layoutget on open code.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
2409a976a2 pnfs: Add LAYOUTGET to OPEN of a new file
This triggers when have no pre-existing inode to attach to.
The preexisting case is saved for later.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
5e36e2a941 pnfs: Change pnfs_alloc_init_layoutget_args call signature
Don't send in a layout, instead use the (possibly NULL) inode.

This is needed for LAYOUTGET attached to an OPEN where the inode is not
yet set.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
1b146fcff7 pnfs: Move nfs4_opendata into nfs4_fs.h
It will be needed now by the pnfs code.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
56f487f8c8 pnfs: Add conditional encode/decode of LAYOUTGET within OPEN compound
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
dacb452db8 pnfs: move allocations out of nfs4_proc_layoutget
They work better in the new alloc_init function.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
587f03deb6 pnfs: refactor send_layoutget
Pull out the alloc/init part for eventual reuse by OPEN.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
f86c3ac502 pnfs: Add layout driver flag PNFS_LAYOUTGET_ON_OPEN
Driver can set flag to allow LAYOUTGET to be sent with OPEN.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
3b65a30df9 NFS4: move ctx into nfs4_run_open_task
Preparing to add conditional LAYOUTGET to OPEN rpc, the LAYOUTGET
will need the ctx info.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
808ba32abe pnfs: Store return value of decode_layoutget for later processing
This will be needed to seperate return value of OPEN and LAYOUTGET
when they are combined into a single RPC.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:10 -04:00
Fred Isaman
34ec9aac7d pnfs: Remove redundant assignment from nfs4_proc_layoutget().
nfs_init_sequence() will clear this for us.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:10 -04:00
Benjamin Coddington
a3cf9bca2a NFSv4: Don't add a new lock on an interrupted wait for LOCK
If the wait for a LOCK operation is interrupted, and then the file is
closed, the locks cleanup code will assume that no new locks will be added
to the inode after it has completed.  We already have a mechanism to detect
if there was signal, so let's use that to avoid recreating the local lock
once the RPC completes.  Also skip re-sending the LOCK operation for the
various error cases if we were signaled.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
[Trond: Fix inverted test of locks_lock_inode_wait()]
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Trond Myklebust
cf61eb2686 NFSv4: Always clear the pNFS layout when handling ESTALE
If we get an ESTALE error in response to an RPC call operating on the
file on the MDS, we should immediately cancel the layout for that file.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Dave Wysochanski
d68894800e NFSv4: Fix possible 1-byte stack overflow in nfs_idmap_read_and_verify_message
In nfs_idmap_read_and_verify_message there is an incorrect sprintf '%d'
that converts the __u32 'im_id' from struct idmap_msg to 'id_str', which
is a stack char array variable of length NFS_UINT_MAXLEN == 11.
If a uid or gid value is > 2147483647 = 0x7fffffff, the conversion
overflows into a negative value, for example:
crash> p (unsigned) (0x80000000)
$1 = 2147483648
crash> p (signed) (0x80000000)
$2 = -2147483648
The '-' sign is written to the buffer and this causes a 1 byte overflow
when the NULL byte is written, which corrupts kernel stack memory.  If
CONFIG_CC_STACKPROTECTOR_STRONG is set we see a stack-protector panic:

[11558053.616565] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa05b8a8c
[11558053.639063] CPU: 6 PID: 9423 Comm: rpc.idmapd Tainted: G        W      ------------ T 3.10.0-514.el7.x86_64 #1
[11558053.641990] Hardware name: Red Hat OpenStack Compute, BIOS 1.10.2-3.el7_4.1 04/01/2014
[11558053.644462]  ffffffff818c7bc0 00000000b1f3aec1 ffff880de0f9bd48 ffffffff81685eac
[11558053.646430]  ffff880de0f9bdc8 ffffffff8167f2b3 ffffffff00000010 ffff880de0f9bdd8
[11558053.648313]  ffff880de0f9bd78 00000000b1f3aec1 ffffffff811dcb03 ffffffffa05b8a8c
[11558053.650107] Call Trace:
[11558053.651347]  [<ffffffff81685eac>] dump_stack+0x19/0x1b
[11558053.653013]  [<ffffffff8167f2b3>] panic+0xe3/0x1f2
[11558053.666240]  [<ffffffff811dcb03>] ? kfree+0x103/0x140
[11558053.682589]  [<ffffffffa05b8a8c>] ? idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4]
[11558053.689710]  [<ffffffff810855db>] __stack_chk_fail+0x1b/0x30
[11558053.691619]  [<ffffffffa05b8a8c>] idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4]
[11558053.693867]  [<ffffffffa00209d6>] rpc_pipe_write+0x56/0x70 [sunrpc]
[11558053.695763]  [<ffffffff811fe12d>] vfs_write+0xbd/0x1e0
[11558053.702236]  [<ffffffff810acccc>] ? task_work_run+0xac/0xe0
[11558053.704215]  [<ffffffff811fec4f>] SyS_write+0x7f/0xe0
[11558053.709674]  [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b

Fix this by calling the internally defined nfs_map_numeric_to_string()
function which properly uses '%u' to convert this __u32.  For consistency,
also replace the one other place where snprintf is called.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reported-by: Stephen Johnston <sjohnsto@redhat.com>
Fixes: cf4ab538f1 ("NFSv4: Fix the string length returned by the idmapper")
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Trond Myklebust
d554168f87 NFS: Fix up nfs_post_op_update_inode() to force ctime updates
We do not want to ignore ctime updates that originate from functions
such as link().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Trond Myklebust
472f761e11 NFS: Ensure we revalidate the inode correctly after setacl
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Trond Myklebust
59a707b0d4 NFS: Ensure we revalidate the inode correctly after remove or rename
We may need to revalidate the change attribute, ctime and the nlinks count.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Trond Myklebust
821a868a23 NFS: Set the force revalidate flag if the inode is not completely initialised
Ensure that a delegation doesn't cause us to skip initialising the inode
if it was incomplete when we exited nfs_fhget()

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Trond Myklebust
3cb3fd6da4 NFS: Fix up sillyrename()
Ensure that we register the fact that the inode ctime has changed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Trond Myklebust
ed7e9ad090 NFSv4: Fix sillyrename to return the delegation when appropriate
Ensure that we pass down the inode of the file being deleted so
that we can return any delegation being held.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Trond Myklebust
991eedb137 NFSv4: Only pass the delegation to setattr if we're sending a truncate
Even then it isn't really necessary. The reason why we may not want to
pass in a stateid in other cases is that we cannot use the delegation
credential.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Anna Schumaker
2f261020b6 NFS: Merge nfs41_free_stateid() with _nfs41_free_stateid()
Having these exist as two functions doesn't seem to add anything useful,
and I think merging them together makes this easier to follow.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Anna Schumaker
fba83f3411 NFS: Pass "privileged" value to nfs4_init_sequence()
We currently have a separate function just to set this, but I think it
makes more sense to set it at the same time as the other values in
nfs4_init_sequence()

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Anna Schumaker
e9ae1ee2b2 NFS: Move call to nfs4_state_protect() to nfs4_commit_setup()
Rather than doing this in the generic NFS client code.  Let's put this
with the other v4 stuff so it's all in one place.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
Anna Schumaker
fb91fb0ee7 NFS: Move call to nfs4_state_protect_write() to nfs4_write_setup()
This doesn't really need to be in the generic NFS client code, and I
think it makes more sense to keep the v4 code in one place.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:16 -04:00
NeilBrown
e04bbf6b1b NFS: Avoid quadratic search when freeing delegations.
There are three places that walk all delegation for an nfs_client and
restart whenever they find something interesting - potentially
resulting in a quadratic search:  If there are 10,000 uninteresting
delegations followed by 10,000 interesting one, then the code
skips over 100,000,000 delegations, which can take a noticeable amount
of time.

Of these nfs_delegation_reap_unclaimed() and
nfs_reap_expired_delegations() are only called during unusual events:
a server reboots or reports expired delegations, probably due to a
network partition.  Optimizing these is not particularly important.

The third, nfs_client_return_marked_delegations(), is called
periodically via nfs_expire_unreferenced_delegations().  It could
cause periodic problems on a busy server.

New delegations are added to the end of the list, so if there are
10,000 open files with delegations, and 10,000 more recently opened files
that received delegations but are now closed, then
nfs_client_return_marked_delegations() can take seconds to skip over
the 10,000 open files 10,000 times.  That is a waste of time.

The avoid this waste a place-holder (an inode) is kept when locks are
dropped, so that the place can usually be found again after taking
rcu_readlock().  This place holder ensure that we find the right
starting point in the list of nfs_servers, and makes is probable that
we find the right starting point in the list of delegations.
We might need to occasionally restart at the head of that list.

It might be possible that the place_holder inode could lose its
delegation separately, and then get a new one using the same (freed
and then reallocated) 'struct nfs_delegation'.  Were this to happen,
the new delegation would be at the end of the list and we would miss
returning some other delegations.  This would have the effect of
unnecessarily delaying the return of some unused delegations until the
next time this function is called - typically 90 seconds later.  As
this is not a correctness issue and is vanishingly unlikely to happen,
it does not seem worth addressing.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:02:14 -04:00
NeilBrown
3ca951b618 NFS: use cond_resched() when restarting walk of delegation list.
In three places we walk the list of delegations for an nfs_client
until an interesting one is found, then we act of that delegation
and restart the walk.

New delegations are added to the end of a list and the interesting
delegations are usually old, so in many case we won't repeat
a long walk over and over again, but it is possible - particularly if
the first server in the list has a large number of uninteresting
delegations.

In each cache the work done on interesting delegations will often
complete without sleeping, so this could loop many times without
giving up the CPU.

So add a cond_resched() at an appropriate point to avoid hogging the
CPU for too long.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 14:59:19 -04:00
NeilBrown
f389349142 NFS: slight optimization for walking list for delegations
There are 3 places where we walk the list of delegations
for an nfs_client.
In each case there are two nested loops, one for nfs_servers
and one for nfs_delegations.

When we find an interesting delegation we try to get an active
reference to the server.  If that fails, it is pointless to
continue to look at the other delegation for the server as
we will never be able to get an active reference.
So instead of continuing in the inner loop, break out
and continue in the outer loop.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 14:59:19 -04:00
Trond Myklebust
9f6d44d418 NFS: Optimise away lookups for rename targets
We can optimise away any lookup for a rename target, unless we're
being asked to revalidate a dentry that might be in use.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-28 13:29:19 -04:00