Commit Graph

3992 Commits

Author SHA1 Message Date
Trond Myklebust
3c13cb5b64 NFSv4.1/pnfs: Play safe w.r.t. close() races when return-on-close is set
If we have an OPEN_DOWNGRADE and CLOSE race with one another, we want
to ensure that the layout is forgotten by the client, so that we
start afresh with a new layoutget.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-18 23:45:13 -05:00
Trond Myklebust
4ff376feaf NFSv4.1/pnfs: Fix a close/delegreturn hang when return-on-close is set
The helper pnfs_roc() has already verified that we have no delegations,
and no further open files, hence no outstanding I/O and it has marked
all the return-on-close lsegs as being invalid.
Furthermore, it sets the NFS_LAYOUT_RETURN bit, thus serialising the
close/delegreturn with all future layoutget calls on this inode.

The checks in pnfs_roc_drain() for valid layout segments are therefore
redundant: those cannot exist until another layoutget completes.
The other check for whether or not NFS_LAYOUT_RETURN is set, actually
causes a hang, since we already know that we hold that flag.

To fix, we therefore strip out all the functionality in pnfs_roc_drain()
except the retrieval of the barrier state, and then rename the function
accordingly.

Reported-by: Christoph Hellwig <hch@infradead.org>
Fixes: 5c4a79fb2b ("Don't prevent layoutgets when doing return-on-close")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-18 23:23:21 -05:00
Trond Myklebust
7e94d6c4ab NFS: Don't fsync twice for O_SYNC/IS_SYNC files
generic_file_write_iter() will already do an fsync on our behalf
if the file descriptor is O_SYNC or the file is marked as IS_SYNC.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 16:55:18 -05:00
Trond Myklebust
7c2dad99d6 NFS: Don't let the ctime override attribute barriers.
Chuck reports seeing cases where a GETATTR that happens to race
with an asynchronous WRITE is overriding the file size, despite
the attribute barrier being set by the writeback code.

The culprit turns out to be the check in nfs_ctime_need_update(),
which sees that the ctime is newer than the cached ctime, and
assumes that it is safe to override the attribute barrier.
This patch removes that override, and ensures that attribute
barriers are always respected.

Reported-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: a08a8cd375 ("NFS: Add attribute update barriers to NFS writebacks")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:37:21 -05:00
Trond Myklebust
74918f937d Merge branch 'layoutfixes'
* layoutfixes:
  NFSv4.1/pnfs: Remove redundant wakeup in pnfs_send_layoutreturn()
  NFSv4.1/pnfs: Remove redundant check in pnfs_layoutgets_blocked()
  NFSv4.1/pnfs: Remove redundant lo->plh_block_lgets in layoutreturn
  NFSv4.1/pnfs: Don't prevent layoutgets when doing return-on-close
  NFSv4.1/pnfs: Fix serialisation of layout return and layoutget
  NFSv4.1/pnfs: Remove redundant checks in pnfs_layoutgets_blocked()
  pNFS: Tighten up locking around DS commit buckets
2015-08-17 13:36:32 -05:00
Trond Myklebust
37bfcc14b2 Merge branch 'bugfixes'
* bugfixes:
  SUNRPC: Fix a thinko in xs_connect()
  NFSv4.1/pNFS: Fix borken function _same_data_server_addrs_locked()
  NFS: nfs_set_pgio_error sometimes misses errors
2015-08-17 13:36:22 -05:00
Trond Myklebust
eed889b161 NFS: NFS over RDMA Client Side Changes
These patches improve both client performance and scalability, most notably
 by increasing the maixmum allowed rsize and wsize and by increasing the number
 of RDMA "credits".  There are also several bugfixes, such as correcting how
 WRITE compounds are encoded and fixing large NFS symlink operations.
 
 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVy5O3AAoJENfLVL+wpUDr4FsP/RyDj8wZiHmLy5LGlcuIK7lJ
 mOyblxvA1DdADNzHUk9ORG3T6+I2Be6K6gXgD4va+9tmcBAvUEI25OhnKnmau4U+
 mK9ZYUQWISKdd9FHkX33PmgnspxBP0DdS/rO9GeFRh9EhjGmpHQzOb6BbUDe04wA
 PEll9LKZM+NTzOzJL8X5kDA9rXHGLo1T+aN5OMA0EQqxRDurXgLvZXZ5YzLPzgzW
 52KdBIRr0ijFtXeIB87B/ATqEetcDgMRkxB2vMuTbF1Jsc1Fu2xGj2KyncCda+X7
 6rBMB139zB3rBGG2szWfXU733jZmID09esvSHiC5oM6KYVPMxlF9ktvs+dV8g07k
 t+svp4Y1LhHEojfuOKA9arEG7URdlBOUvqwF/UfwFOfcdYR2d2hyfzO/VtsdxhkB
 O+kNvrnu2WsWgiyzEVPw2K0d0I5pAQSSxR2lZKzOYFjpwwjqjuvX+xYeC35BvJHv
 raGvIvdnPBqd9r9DO/v7kFv4xSNLpyi8o0CLtRowWN29LRVJqhz8gWOttBWOJ/1r
 IJYRDMB+R3kWFKe99/ev3IUslV8hekqbH4iYUGuPwFQNurSb69AdrdktsMWHP/8P
 1bMCidNCcNUjui2kTX7HwTRWjjjXwK3+RTSToe6t9iKxMYtLSif4m3K7n/0dAzdK
 ip8Qpx18MuJHphNiXb79
 =7PAr
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-for-4.3' of git://git.linux-nfs.org/projects/anna/nfs-rdma

NFS: NFS over RDMA Client Side Changes

These patches improve both client performance and scalability, most notably
by increasing the maixmum allowed rsize and wsize and by increasing the number
of RDMA "credits".  There are also several bugfixes, such as correcting how
WRITE compounds are encoded and fixing large NFS symlink operations.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-08-17 13:33:58 -05:00
Anna Schumaker
aff8d8dc4c NFS: Remove nfs_release()
And call nfs_file_clear_open_context() directly.  This makes it obvious
that nfs_file_release() will always return 0.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:32:56 -05:00
Anna Schumaker
ae09c31f66 NFS: Rename nfs_commit_unstable_pages() to nfs_write_inode()
All nfs_write_inode() does is pass its arguments to
nfs_commit_unstable_pages().  Let's cut out the middle man and have
nfs_write_pages() do the work directly.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:32:32 -05:00
Anna Schumaker
3f10a6af4b NFS: Remove nfs41_server_notify_{target|highest}_slotid_update()
All these functions do is call nfs41_ping_server() without adding
anything.  Let's remove them and give nfs41_ping_server() a better name
instead.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:32:00 -05:00
Anna Schumaker
fb2a525cf0 NFS: Combine nfs_idmap_{init|quit}() and nfs_idmap_{init|quit}_keyring()
The idmap_init() and idmap_quit() functions only exist to call the
_keyring() version.  Let's just call the keyring() functions directly.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:29:56 -05:00
Anna Schumaker
d8efa4e625 NFS: Use RPC functions for matching sockaddrs
They already exist and do the exact same thing.  Let's save ourselves
several lines of code!

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:29:51 -05:00
Anna Schumaker
c7e9668e78 NFS: Rename nfs_readdir_free_pagearray() and nfs_readdir_large_page()
nfs_readdir_xdr_to_array() uses both a cache array and an array of
pages, so I rename these functions to make it clearer how the code
works.  nfs_readdir_large_page() becomes nfs_readdir_alloc_pages()
because this function has absolutely nothing to do with setting up a
large page.  nfs_readdir_free_pagearray() becomes
nfs_readdir_free_pages() to stay consistent with the new alloc_pages()
function.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:29:31 -05:00
Anna Schumaker
0b936e37df NFS: Remove unused variable "pages_ptr"
This variable is initialized to NULL and is never modified before being
passed to nfs_readdir_free_large_page().  But that's okay, because
nfs_readdir_free_large_page() only seems to exist as a way of calling
nfs_readdir_free_pagearray() without this parameter.  Let's simplify by
removing pages_ptr and nfs_readdir_free_pagearray().

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:29:24 -05:00
Jeff Layton
ce60328146 nfs: remove some dead code in ff_layout_pg_get_mirror_count_write
We already know that pg_lseg is NULL here.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:25:03 -05:00
Christoph Hellwig
8bb2897582 pnfs: move common blocklayout XDR defintions to nfs4.h
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:22:49 -05:00
Christoph Hellwig
513d6d7a95 pnfs/blocklayout: pass proper file mode to blkdev_get/put
We generally want to read and write to a block device that's used by
the pNFS block layout client (and even if it's read only the server
has no way of telling us).  Add FMODE_WRITE to the mode argument
so that we don't incorrectly tell the block driver that we want a
read-only open.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:22:49 -05:00
Christoph Hellwig
2bd3c63a33 pnfs/blocklayout: reject too long signatures
Instead of overwriting kernel memory reject too long signatures.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:22:49 -05:00
Christoph Hellwig
68596bd188 pnfs/blocklayout: set up layoutupdate_pages properly
We need to replace the __be32 with a void pointer to do proper arithmentics
on the virtual addresses so that we can get the right page pointers.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:22:49 -05:00
Christoph Hellwig
29662fa646 pnfs/blocklayout: calculate layoutupdate size correctly
We need to include the first u32 for the number of entries.  Add a helper
for the calculation instead of opencoding it so that it's in one place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:22:49 -05:00
Kinglong Mee
18e3b739fd NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client
---Steps to Reproduce--
<nfs-server>
# cat /etc/exports
/nfs/referal  *(rw,insecure,no_subtree_check,no_root_squash,crossmnt)
/nfs/old      *(ro,insecure,subtree_check,root_squash,crossmnt)

<nfs-client>
# mount -t nfs nfs-server:/nfs/ /mnt/
# ll /mnt/*/

<nfs-server>
# cat /etc/exports
/nfs/referal   *(rw,insecure,no_subtree_check,no_root_squash,crossmnt,refer=/nfs/old/@nfs-server)
/nfs/old       *(ro,insecure,subtree_check,root_squash,crossmnt)
# service nfs restart

<nfs-client>
# ll /mnt/*/    --->>>>> oops here

[ 5123.102925] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 5123.103363] IP: [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.103752] PGD 587b9067 PUD 3cbf5067 PMD 0
[ 5123.104131] Oops: 0000 [#1]
[ 5123.104529] Modules linked in: nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon parport_pc parport i2c_piix4 shpchp auth_rpcgss nfs_acl vmw_vmci lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi serio_raw scsi_transport_spi e1000 mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd]
[ 5123.105887] CPU: 0 PID: 15853 Comm: ::1-manager Tainted: G           OE   4.2.0-rc6+ #214
[ 5123.106358] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 5123.106860] task: ffff88007620f300 ti: ffff88005877c000 task.ti: ffff88005877c000
[ 5123.107363] RIP: 0010:[<ffffffffa03ed38b>]  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.107909] RSP: 0018:ffff88005877fdb8  EFLAGS: 00010246
[ 5123.108435] RAX: ffff880053f3bc00 RBX: ffff88006ce6c908 RCX: ffff880053a0d240
[ 5123.108968] RDX: ffffea0000e6d940 RSI: ffff8800399a0000 RDI: ffff88006ce6c908
[ 5123.109503] RBP: ffff88005877fe28 R08: ffffffff81c708a0 R09: 0000000000000000
[ 5123.110045] R10: 00000000000001a2 R11: ffff88003ba7f5c8 R12: ffff880054c55800
[ 5123.110618] R13: 0000000000000000 R14: ffff880053a0d240 R15: ffff880053a0d240
[ 5123.111169] FS:  0000000000000000(0000) GS:ffffffff81c27000(0000) knlGS:0000000000000000
[ 5123.111726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5123.112286] CR2: 0000000000000000 CR3: 0000000054cac000 CR4: 00000000001406f0
[ 5123.112888] Stack:
[ 5123.113458]  ffffea0000e6d940 ffff8800399a0000 00000000000167d0 0000000000000000
[ 5123.114049]  0000000000000000 0000000000000000 0000000000000000 00000000a7ec82c6
[ 5123.114662]  ffff88005877fe18 ffffea0000e6d940 ffff8800399a0000 ffff880054c55800
[ 5123.115264] Call Trace:
[ 5123.115868]  [<ffffffffa03fb44b>] nfs4_try_migration+0xbb/0x220 [nfsv4]
[ 5123.116487]  [<ffffffffa03fcb3b>] nfs4_run_state_manager+0x4ab/0x7b0 [nfsv4]
[ 5123.117104]  [<ffffffffa03fc690>] ? nfs4_do_reclaim+0x510/0x510 [nfsv4]
[ 5123.117813]  [<ffffffff810a4527>] kthread+0xd7/0xf0
[ 5123.118456]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
[ 5123.119108]  [<ffffffff816d9cdf>] ret_from_fork+0x3f/0x70
[ 5123.119723]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
[ 5123.120329] Code: 4c 8b 6a 58 74 17 eb 52 48 8d 55 a8 89 c6 4c 89 e7 e8 4a b5 ff ff 8b 45 b0 85 c0 74 1c 4c 89 f9 48 8b 55 90 48 8b 75 98 48 89 df <41> ff 55 00 3d e8 d8 ff ff 41 89 c6 74 cf 48 8b 4d c8 65 48 33
[ 5123.121643] RIP  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.122308]  RSP <ffff88005877fdb8>
[ 5123.122942] CR2: 0000000000000000

Fixes: ec011fe847 ("NFS: Introduce a vector of migration recovery ops")
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:22:27 -05:00
Trond Myklebust
6f536936b7 NFSv4.1/pNFS: Fix borken function _same_data_server_addrs_locked()
- Switch back to using list_for_each_entry(). Fixes an incorrect test
  for list NULL termination.
- Do not assume that lists are sorted.
- Finally, consider an existing entry to match if it consists of a subset
  of the addresses in the new entry.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:05:43 -05:00
Trond Myklebust
e9ae58aeee NFS: nfs_set_pgio_error sometimes misses errors
We should ensure that we always set the pgio_header's error field
if a READ or WRITE RPC call returns an error. The current code depends
on 'hdr->good_bytes' always being initialised to a large value, which
is not always done correctly by callers.
When this happens, applications may end up missing important errors.

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17 13:05:03 -05:00
Trond Myklebust
58830550f0 NFSv4.1/pnfs: Remove redundant wakeup in pnfs_send_layoutreturn()
pnfs_clear_layoutreturn_waitbit() should already be calling
rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq) for us.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:19 -04:00
Trond Myklebust
e1c06f80dc NFSv4.1/pnfs: Remove redundant check in pnfs_layoutgets_blocked()
layoutget now should already be serialised w.r.t. layout returns

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:19 -04:00
Trond Myklebust
2d8ae84fbc NFSv4.1/pnfs: Remove redundant lo->plh_block_lgets in layoutreturn
The NFS_LAYOUT_RETURN bit already suffices to ensure that layoutget
is blocked.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:19 -04:00
Trond Myklebust
5c4a79fb2b NFSv4.1/pnfs: Don't prevent layoutgets when doing return-on-close
If there is an outstanding return-on-close, then we just want new
layoutget requests to wait rather than fail.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:19 -04:00
Trond Myklebust
8f70f53a87 NFSv4.1/pnfs: Fix serialisation of layout return and layoutget
We should always test for outstanding layout returns, whether or not
pnfs_should_retry_layoutget() is true.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:19 -04:00
Trond Myklebust
a4497a58e4 NFSv4.1/pnfs: Remove redundant checks in pnfs_layoutgets_blocked()
If there are no valid layout segments, then we should already have
checked in pnfs_update_layout() whether or not this is the first
layoutget.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:18 -04:00
Trond Myklebust
27571297a7 pNFS: Tighten up locking around DS commit buckets
I'm not aware of any bugreports around this issue, but the locking
around the pnfs_commit_bucket is inconsistent at best. This patch
tightens it up by ensuring that the 'bucket->committing' list is always
changed atomically w.r.t. the 'bucket->clseg' layout segment tracking.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:56:18 -04:00
Kinglong Mee
0847ef88c3 NFS: Remove duplicate svc_xprt_put from nfs41_callback_up
The xprt created by svc_create_xprt have be added to serv->sv_permsocks.
So putting the xprt directly is useless.
Otherwise, there is a more svc_xprt_put after the xprt be freed.

v2, same as v1.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:42:23 -04:00
NeilBrown
efcbc04e16 NFSv4: don't set SETATTR for O_RDONLY|O_EXCL
It is unusual to combine the open flags O_RDONLY and O_EXCL, but
it appears that libre-office does just that.

[pid  3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0
[pid  3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...>

NFSv4 takes O_EXCL as a sign that a setattr command should be sent,
probably to reset the timestamps.

When it was an O_RDONLY open, the SETATTR command does not
identify any actual attributes to change.
If no delegation was provided to the open, the SETATTR uses the
all-zeros stateid and the request is accepted (at least by the
Linux NFS server - no harm, no foul).

If a read-delegation was provided, this is used in the SETATTR
request, and a Netapp filer will justifiably claim
NFS4ERR_BAD_STATEID, which the Linux client takes as a sign
to retry - indefinitely.

So only treat O_EXCL specially if O_CREAT was also given.

Signed-off-by: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:42:23 -04:00
Kinglong Mee
5ef8d792fa NFS: Error out when register_shrinker fail in register_nfs_fs
Commit 1d3d4437ea "vmscan: per-node deferred work" have made
register_shrinker can return an intergater error.

If register_shrinker() fail, the later unregister_shrinker() will
 cause a NULL pointer access.

v2, same as v1.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:42:23 -04:00
Trond Myklebust
c8ad8894e9 NFSv4.2/pnfs: Use GFP_NOIO for layoutstat reporting in the writeback path
Prevent a potential deadlock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:27:23 -04:00
Peng Tao
d099d7b831 pnfs/flexfiles: LAYOUTSTATS ii_count should be ops instead of bytes
Turned out I misinterpreted the spec...

Cc: Tom Haynes <thomas.haynes@primarydata.com>
Reported-by: Jean Spector <jean@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12 14:26:17 -04:00
Trond Myklebust
86d80f9734 NFSv4.1/pnfs: Fix atomicity of commit list updates
pnfs_layout_mark_request_commit() needs to ensure that it adds the
request to the commit list atomically with all the other updates
in order to prevent corruption to buckets[ds_commit_idx].wlseg
due to races with pnfs_generic_clear_request_commit().

Fixes: 338d00cfef ("pnfs: Refactor the *_layout_mark_request_commit...")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-10 19:08:13 -04:00
Chuck Lever
2fcc213a18 xprtrdma: Fix large NFS SYMLINK calls
Repair how rpcrdma_marshal_req() chooses which RDMA message type
to use for large non-WRITE operations so that it picks RDMA_NOMSG
in the correct situations, and sets up the marshaling logic to
SEND only the RPC/RDMA header.

Large NFSv2 SYMLINK requests now use RDMA_NOMSG calls. The Linux NFS
server XDR decoder for NFSv2 SYMLINK does not handle having the
pathname argument arrive in a separate buffer. The decoder could be
fixed, but this is simpler and RDMA_NOMSG can be used in a variety
of other situations.

Ensure that the Linux client continues to use "RDMA_MSG + read
list" when sending large NFSv3 SYMLINK requests, which is more
efficient than using RDMA_NOMSG.

Large NFSv4 CREATE(NF4LNK) requests are changed to use "RDMA_MSG +
read list" just like NFSv3 (see Section 5 of RFC 5667). Before,
these did not work at all.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-08-05 16:21:28 -04:00
Linus Torvalds
d8132e08d2 NFS client bugfixes for Linux 4.2
Highlights include:
 
 Stable patches:
 - Fix a situation where the client uses the wrong (zero) stateid.
 - Fix a memory leak in nfs_do_recoalesce
 
 Bugfixes:
 - Plug a memory leak when ->prepare_layoutcommit fails
 - Fix an Oops in the NFSv4 open code
 - Fix a backchannel deadlock
 - Fix a livelock in sunrpc when sendmsg fails due to low memory availability
 - Don't revalidate the mapping if both size and change attr are up to date
 - Ensure we don't miss a file extension when doing pNFS
 - Several fixes to handle NFSv4.1 sequence operation status bits correctly
 - Several pNFS layout return bugfixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVt6RGAAoJEGcL54qWCgDyiDIP/2+fUM7Tc1llCxYbM2WLC6Ar
 34v5yVwO96MqhI4L2mXB5FJvr4LP2/EZ4ZExMcf4ymT7pgJnjFK4nEv9IHUSy6xb
 ea+oS9GjvFSeGdkukJLRniNER5/ZG3GWkojlHNJCgByoIVRK4ISXF/qL9w2sedGw
 +5ejvjqie9NmBnBXMq8DRlU+kXhVYCF6E9qWATwUNK5Eq2eeQnDbA2w9ACSBVK3W
 LhCvZi0eBq7krSbHob018PmlQ0VPvmYwk5xL4d//FvcaNj/utk82VjAZCdKOK1sH
 qn8hcKgVeVko/3jwcUp6m3zAkKZ1IX/XaXJeHbosnKG/g0vy3hQirpa/g2iDTQ4H
 NXOSwcsd6syReZDZbQTxbvaSOp5ACxZAQKYLnlPerJ/hMpXDQCEAwyeAFKzEaKz4
 FfF0VJF+30w9PJk3wgk2DF66xbYVfHyvrLtVcb/ki8gb91cH09i+nFFSSfHQBMLh
 +ciHg7rOyXnbXoCaW9fBvONz2sCYDwbHATmhpWWZIx/3UTDf5owxHFa3BFDgGKnD
 jyiPjMh6I3JUE+Qm1zwInsfsskBKRSl2BdJgTHBGY5ODuQGF/sogOmvgbrT7Ox3t
 kbL8nzCydqLixM+4aw61nYakZqgDsKNER5Ggr+lkv4AZ2dH6IeP2IZjuoHLLylvZ
 dyqHwpCjoUtmYAUr166U
 =wlUD
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Stable patches:
   - Fix a situation where the client uses the wrong (zero) stateid.
   - Fix a memory leak in nfs_do_recoalesce

  Bugfixes:
   - Plug a memory leak when ->prepare_layoutcommit fails
   - Fix an Oops in the NFSv4 open code
   - Fix a backchannel deadlock
   - Fix a livelock in sunrpc when sendmsg fails due to low memory
     availability
   - Don't revalidate the mapping if both size and change attr are up to
     date
   - Ensure we don't miss a file extension when doing pNFS
   - Several fixes to handle NFSv4.1 sequence operation status bits
     correctly
   - Several pNFS layout return bugfixes"

* tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits)
  nfs: Fix an oops caused by using other thread's stack space in ASYNC mode
  nfs: plug memory leak when ->prepare_layoutcommit fails
  SUNRPC: Report TCP errors to the caller
  sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable.
  NFSv4.2: handle NFS-specific llseek errors
  NFS: Don't clear desc->pg_moreio in nfs_do_recoalesce()
  NFS: Fix a memory leak in nfs_do_recoalesce
  NFS: nfs_mark_for_revalidate should always set NFS_INO_REVAL_PAGECACHE
  NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability
  NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised
  NFS: Don't revalidate the mapping if both size and change attr are up to date
  NFSv4/pnfs: Ensure we don't miss a file extension
  NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked
  SUNRPC: xprt_complete_bc_request must also decrement the free slot count
  SUNRPC: Fix a backchannel deadlock
  pNFS: Don't throw out valid layout segments
  pNFS: pnfs_roc_drain() fix a race with open
  pNFS: Fix races between return-on-close and layoutreturn.
  pNFS: pnfs_roc_drain should return 'true' when sleeping
  pNFS: Layoutreturn must invalidate all existing layout segments.
  ...
2015-07-28 09:37:44 -07:00
Kinglong Mee
a49c269111 nfs: Fix an oops caused by using other thread's stack space in ASYNC mode
An oops caused by using other thread's stack space in sunrpc ASYNC sending thread.

[ 9839.007187] ------------[ cut here ]------------
[ 9839.007923] kernel BUG at fs/nfs/nfs4xdr.c:910!
[ 9839.008069] invalid opcode: 0000 [#1] SMP
[ 9839.008069] Modules linked in: blocklayoutdriver rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm joydev iosf_mbi crct10dif_pclmul snd_timer crc32_pclmul crc32c_intel ghash_clmulni_intel snd soundcore ppdev pvpanic parport_pc i2c_piix4 serio_raw virtio_balloon parport acpi_cpufreq nfsd nfs_acl lockd grace auth_rpcgss sunrpc qxl drm_kms_helper virtio_net virtio_console virtio_blk ttm drm virtio_pci virtio_ring virtio ata_generic pata_acpi
[ 9839.008069] CPU: 0 PID: 308 Comm: kworker/0:1H Not tainted 4.0.0-0.rc4.git1.3.fc23.x86_64 #1
[ 9839.008069] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 9839.008069] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ 9839.008069] task: ffff8800d8b4d8e0 ti: ffff880036678000 task.ti: ffff880036678000
[ 9839.008069] RIP: 0010:[<ffffffffa0339cc9>]  [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4]
[ 9839.008069] RSP: 0018:ffff88003667ba58  EFLAGS: 00010246
[ 9839.008069] RAX: 0000000000000000 RBX: 000000001fc15e18 RCX: ffff8800c0193800
[ 9839.008069] RDX: ffff8800e4ae3f24 RSI: 000000001fc15e2c RDI: ffff88003667bcd0
[ 9839.008069] RBP: ffff88003667ba58 R08: ffff8800d9173008 R09: 0000000000000003
[ 9839.008069] R10: ffff88003667bcd0 R11: 000000000000000c R12: 0000000000010000
[ 9839.008069] R13: ffff8800d9173350 R14: 0000000000000000 R15: ffff8800c0067b98
[ 9839.008069] FS:  0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
[ 9839.008069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9839.008069] CR2: 00007f988c9c8bb0 CR3: 00000000d99b6000 CR4: 00000000000407f0
[ 9839.008069] Stack:
[ 9839.008069]  ffff88003667bbc8 ffffffffa03412c5 00000000c6c55680 ffff880000000003
[ 9839.008069]  0000000000000088 00000010c6c55680 0001000000000002 ffffffff816e87e9
[ 9839.008069]  0000000000000000 00000000477290e2 ffff88003667bab8 ffffffff81327ba3
[ 9839.008069] Call Trace:
[ 9839.008069]  [<ffffffffa03412c5>] encode_attrs+0x435/0x530 [nfsv4]
[ 9839.008069]  [<ffffffff816e87e9>] ? inet_sendmsg+0x69/0xb0
[ 9839.008069]  [<ffffffff81327ba3>] ? selinux_socket_sendmsg+0x23/0x30
[ 9839.008069]  [<ffffffff8164c1df>] ? do_sock_sendmsg+0x9f/0xc0
[ 9839.008069]  [<ffffffff8164c278>] ? kernel_sendmsg+0x58/0x70
[ 9839.008069]  [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc]
[ 9839.008069]  [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc]
[ 9839.008069]  [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4]
[ 9839.008069]  [<ffffffffa03419a5>] encode_open+0x2d5/0x340 [nfsv4]
[ 9839.008069]  [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4]
[ 9839.008069]  [<ffffffffa011ab89>] ? xdr_encode_opaque+0x19/0x20 [sunrpc]
[ 9839.008069]  [<ffffffffa0339cfb>] ? encode_string+0x2b/0x40 [nfsv4]
[ 9839.008069]  [<ffffffffa0341bf3>] nfs4_xdr_enc_open+0xb3/0x140 [nfsv4]
[ 9839.008069]  [<ffffffffa0110a4c>] rpcauth_wrap_req+0xac/0xf0 [sunrpc]
[ 9839.008069]  [<ffffffffa01017db>] call_transmit+0x18b/0x2d0 [sunrpc]
[ 9839.008069]  [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc]
[ 9839.008069]  [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc]
[ 9839.008069]  [<ffffffffa010caa0>] __rpc_execute+0x90/0x460 [sunrpc]
[ 9839.008069]  [<ffffffffa010ce85>] rpc_async_schedule+0x15/0x20 [sunrpc]
[ 9839.008069]  [<ffffffff810b452b>] process_one_work+0x1bb/0x410
[ 9839.008069]  [<ffffffff810b47d3>] worker_thread+0x53/0x470
[ 9839.008069]  [<ffffffff810b4780>] ? process_one_work+0x410/0x410
[ 9839.008069]  [<ffffffff810b4780>] ? process_one_work+0x410/0x410
[ 9839.008069]  [<ffffffff810ba7b8>] kthread+0xd8/0xf0
[ 9839.008069]  [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180
[ 9839.008069]  [<ffffffff81786418>] ret_from_fork+0x58/0x90
[ 9839.008069]  [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180
[ 9839.008069] Code: 00 00 48 c7 c7 21 fa 37 a0 e8 94 1c d6 e0 c6 05 d2 17 05 00 01 8b 03 eb d7 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 89 f3
[ 9839.008069] RIP  [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4]
[ 9839.008069]  RSP <ffff88003667ba58>
[ 9839.071114] ---[ end trace cc14c03adb522e94 ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-28 09:07:03 -04:00
Jeff Layton
3471648a75 nfs: plug memory leak when ->prepare_layoutcommit fails
"data" is currently leaked when the prepare_layoutcommit operation
returns an error. Put the cred before taking the spinlock in that
case, take the lock and then goto out_unlock which will drop the
lock and then free "data".

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-28 09:07:02 -04:00
J. Bruce Fields
bdcc2cd14e NFSv4.2: handle NFS-specific llseek errors
Handle NFS-specific llseek errors instead of letting them leak out to
userspace.

Reported-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-27 11:16:25 -04:00
Trond Myklebust
d4c30454db NFS: Don't clear desc->pg_moreio in nfs_do_recoalesce()
Recoalescing does not affect whether or not we've already sent off
I/O, and doing so means that we end up sending a bunch of synchronous
for cases where we actually need to be using unstable writes.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-27 10:33:12 -04:00
Trond Myklebust
03d5eb65b5 NFS: Fix a memory leak in nfs_do_recoalesce
If the function exits early, then we must put those requests that were
not processed back onto the &mirror->pg_list so they can be cleaned up
by nfs_pgio_error().

Fixes: a7d42ddb30 ("nfs: add mirroring support to pgio layer")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-27 10:33:08 -04:00
Trond Myklebust
cd81259979 NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability
Setting the change attribute has been mandatory for all NFS versions, since
commit 3a1556e866 ("NFSv2/v3: Simulate the change attribute"). We should
therefore not have anything be conditional on it being set/unset.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-22 17:15:54 -04:00
Trond Myklebust
5c675d6420 NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised
We can't allow caching of data until the change attribute has been
initialised correctly.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-22 17:15:53 -04:00
Trond Myklebust
85a23cee3f NFS: Don't revalidate the mapping if both size and change attr are up to date
If we've ensured that the size and the change attribute are both correct,
then there is no point in marking those attributes as needing revalidation
again. Only do so if we know the size is incorrect and was not updated.

Fixes: f2467b6f64 ("NFS: Clear NFS_INO_REVAL_PAGECACHE when...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-22 17:15:53 -04:00
Trond Myklebust
2b83d3de4c NFSv4/pnfs: Ensure we don't miss a file extension
pNFS writes don't return attributes, however that doesn't mean that we
should ignore the fact that they may be extending the file. This patch
ensures that if a write is seen to extend the file, then we always set
an attribute barrier, and update the cached file size.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-22 17:15:53 -04:00
Trond Myklebust
3c38cbe2ad NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked
Otherwise, nfs4_select_rw_stateid() will always return the zero stateid
instead of the correct open stateid.

Fixes: f95549cf24 ("NFSv4: More CLOSE/OPEN races")
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-22 17:10:51 -04:00
Jeff Layton
83bfff23e9 nfs4: have do_vfs_lock take an inode pointer
Now that we have file locking helpers that can deal with an inode
instead of a filp, we can change the NFSv4 locking code to use that
instead.

This should fix the case where we have a filp that is closed while flock
or OFD locks are set on it, and the task is signaled so that it doesn't
wait for the LOCKU reply to come in before the filp is freed. At that
point we can end up with a use-after-free with the current code, which
relies on dereferencing the fl_file in the lock request.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: "J. Bruce Fields" <bfields@fieldses.org>
Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
2015-07-13 06:29:11 -04:00
Jeff Layton
ed05676427 Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation"
This reverts commit db2efec0ca.

William reported that he was seeing instability with this patch, which
is likely due to the fact that it can cause the kernel to take a new
reference to a filp after the last reference has already been put.

Revert this patch for now, as we'll need to fix this in another way.

Cc: stable@vger.kernel.org
Reported-by: William Dauchy <william@gandi.net>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: "J. Bruce Fields" <bfields@fieldses.org>
Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
2015-07-13 06:29:11 -04:00