linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-20 10:01:56 +00:00

Author	SHA1	Message	Date
Trond Myklebust	27797d1bb3	pNFS/flexfiles: Remove unused struct members user_name, group_name Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:17:37 -04:00
Peng Tao	8733408d6e	pnfs: add pnfs_report_layoutstat helper function Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:17:37 -04:00
Peng Tao	1b4a4bd82c	pNFS: fill in nfs42_layoutstat_ops Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:17:37 -04:00
Trond Myklebust	be3a5d2339	NFSv.2/pnfs Add a LAYOUTSTATS rpc function Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-24 10:17:37 -04:00
Trond Myklebust	1372a3130a	Merge branch 'bugfixes' * bugfixes: NFS: Ensure we set NFS_CONTEXT_RESEND_WRITES when requeuing writes pNFS: Fix a memory leak when attempted pnfs fails NFS: Ensure that we update the sequence id under the slot table lock nfs: Initialize cb_sequenceres information before validate_seqid() nfs: Only update callback sequnce id when CB_SEQUENCE success NFSv4: nfs4_handle_delegation_recall_error should ignore EAGAIN	2015-06-22 09:55:08 -04:00
Yijing Wang	dfad7000f3	nfs: Fix comment for nfs_pageio_init() and nfs_pageio_complete_mirror() Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-18 08:59:13 -04:00
Trond Myklebust	c70701131f	NFS: Ensure we set NFS_CONTEXT_RESEND_WRITES when requeuing writes If a write attempt fails, and the write is queued up for resending to the server, as opposed to being dropped, then we need to set the appropriate flag so that nfs_file_fsync() does the right thing. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-17 20:00:42 -04:00
Trond Myklebust	1ca018d28d	pNFS: Fix a memory leak when attempted pnfs fails pnfs_do_write() expects the call to pnfs_write_through_mds() to free the pgio header and to release the layout segment before exiting. The problem is that nfs_pgio_data_destroy() doesn't actually do this; it only frees the memory allocated by nfs_generic_pgio(). Ditto for pnfs_do_read()... Fix in both cases is to add a call to hdr->release(hdr). Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-17 20:00:26 -04:00
Trond Myklebust	3438995bc4	NFS: NFSoRDMA Client Changes These patches continue to build up for improving the rsize and wsize that the NFS client uses when talking over RDMA. In addition, these patches also add in scalability enhancements and other bugfixes. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJVey8qAAoJENfLVL+wpUDroT4P/3lwspXwdxS6VZWsW1VpNtdV V1KKd5D+TkpBpz/ih9GdOVaBZijaHpb6XtMReh8xuh0KI893iYmsmLoyhMTPJMsU 6sjUDEv8IFrXwlRKldX1KEfBvNgR0czCNiha6O+YsV5Az08+zr57ahyGKmLUzMxo 4XzPZbwnb5fxvgmBgENUU33g+xXGsXDbsdzLvKW3UGcPU2x6PGOTLr5vP7lQkwxE 20d9ak8xQeRUk0hsmRM4fAebzcluD1o3PLIFQBEhh0Gqm1VGtSCkr9o493gT5TgM /+XrU7B8OnbdJ1B4f/y4Bz4RucfKzyRuXMpulrnK1hL7QIiZLqiph7UrTel/ajcD 0us9PImNwXPo8tMz7Wjw2XMQplndHB3FG3M3lXlJGHlXvCI7F0yjm21AP4SeetOm kxL24Qiyi7l/S7JJxHqNlOc0b8kpVLohBZm6yee9w4r/JUPnynUqfnXCHLjIp/5W F1hzbCUATyfKrSs7VKO0hCQHfntigPEhRmyfoyXRAXzl5LnR1XqD6Wah3a3pwXn+ mEquUd6fKRHIIvJ8cKU6KtykkhRHg1sR/z1mw2ZEW/2PCd0cb+8+WN7X/fQqEN+u +VQSo7oPp38SHdsyozuUUyukN5qHptTMSrNZL+LI7J8/0+BuRuIvW0nojViapc51 LOUlcgqRdUlIvmn754Yo =N1tO -----END PGP SIGNATURE----- Merge tag 'nfs-rdma-for-4.2' of git://git.linux-nfs.org/projects/anna/nfs-rdma NFS: NFSoRDMA Client Changes These patches continue to build up for improving the rsize and wsize that the NFS client uses when talking over RDMA. In addition, these patches also add in scalability enhancements and other bugfixes. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-for-4.2' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (142 commits) xprtrdma: Reduce per-transport MR allocation xprtrdma: Stack relief in fmr_op_map() xprtrdma: Split rb_lock xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy xprtrdma: Remove ->ro_reset xprtrdma: Remove unused LOCAL_INV recovery logic xprtrdma: Acquire MRs in rpcrdma_register_external() xprtrdma: Introduce an FRMR recovery workqueue xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external() xprtrdma: Introduce helpers for allocating MWs xprtrdma: Use ib_device pointer safely xprtrdma: Remove rr_func xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt xprtrdma: Warn when there are orphaned IB objects ...	2015-06-16 11:37:50 -04:00
Trond Myklebust	5ba12443a1	NFSv4: Fix stateid recovery on revoked delegations Ensure that we fix the non-NULL stateid case as well. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:29:51 -04:00
Olga Kornievskaia	ae2ffef383	Recover from stateid-type error on SETATTR Client can receives stateid-type error (eg., BAD_STATEID) on SETATTR when delegation stateid was used. When no open state exists, in case of application calling truncate() on the file, client has no state to recover and fails with EIO. Instead, upon such error, return the bad delegation and then resend the SETATTR with a zero stateid. Signed-off: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:29:46 -04:00
Kinglong Mee	df05a49f72	nfs: Fix showing truncated fsid/dev in, /proc/net/nfsfs/volumes A truncated fsid showing from /proc/fs/nfsfs/volumes as, NV SERVER PORT DEV FSID FSC v4 c0a80881 801 0:43 34931f044c2a439b no It should be as, NV SERVER PORT DEV FSID FSC v4 c0a80881 801 0:43 34931f044c2a439b:954c5d830fa4be8c no The max buffer length for storing "%llx:%llx" format should be 16 + 1 + 16 + 1 = 34 (16 for %llx, 1 for ':', 1 for '\0'). Also, for storing "%u:%u" of MAJOR() and MINOR() should be 8 + 1 + 3 + 1 = 13 (8 for 2^24, 1 for ':', 3 for 2^8, 1 for '\0'). v2, add comments for dev/fsid buffer and use sizeof in snprintf. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:17:37 -04:00
Jeff Layton	873e385116	nfs: make nfs4_init_uniform_client_string use a dynamically allocated buffer Change the uniform client string generator to dynamically allocate the NFSv4 client name string buffer. With this patch, we can eliminate the buffers that are embedded within the "args" structs and simply use the name string that is hanging off the client. This uniform string case is a little simpler than the nonuniform since we don't need to deal with RCU, but we do have two different cases, depending on whether there is a uniquifier or not. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:15:51 -04:00
Jeff Layton	a319268891	nfs: make nfs4_init_nonuniform_client_string use a dynamically allocated buffer The way the *_client_string functions work is a little goofy. They build the string in an on-stack buffer and then use kstrdup to copy it. This is not only stack-heavy but artificially limits the size of the client name string. Change it so that we determine the length of the string, allocate it and then scnprintf into it. Since the contents of the nonuniform string depend on rcu-managed data structures, it's possible that they'll change between when we allocate the string and when we go to fill it. If that happens, free the string, recalculate the length and try again. If it the mismatch isn't resolved on the second try then just give up and return -EINVAL. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:15:45 -04:00
Jeff Layton	b8fb2f595e	nfs: update maxsz values for SETCLIENTID and EXCHANGE_ID The spec allows for up to NFS4_OPAQUE_LIMIT (1k). While we'll almost certainly never use that much, these ops are generally the only ones in the compound so we might as well allow for them to be that large. Also, the existing code didn't add in a word for the opaque length field for either name string. Fix that while we're in there. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:15:40 -04:00
Jeff Layton	3a6bb73879	nfs: convert setclientid and exchange_id encoders to use clp->cl_owner_id ...instead of buffers that are part of their arg structs. We already hold a reference to the client, so we might as well use the allocated buffer. In the event that we can't allocate the clp->cl_owner_id, then just return -ENOMEM. Note too that we switch from a GFP_KERNEL allocation here to GFP_NOFS. It's possible we could end up trying to do a SETCLIENTID or EXCHANGE_ID in order to reclaim some memory, and the GFP_KERNEL allocations in the existing code could cause recursion back into NFS reclaim. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:15:31 -04:00
Fabian Frederick	455b6ee645	pnfs/flexfiles: use swap() in ff_layout_sort_mirrors() Use kernel.h macro definition. Thanks to Julia Lawall for Coccinelle scripting support. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-16 11:14:03 -04:00
Trond Myklebust	4e54ab8d8c	NFS: Ensure that we update the sequence id under the slot table lock Fix a callback slot table regression. Fixes: `e937ee714b` ("nfs: Only update callback sequnce id when CB_SEQUENCE success") Cc: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-11 21:15:52 -04:00
Kinglong Mee	0579c8d208	nfs: Initialize cb_sequenceres information before validate_seqid() For a cb_layoutrecall replay, nfsd got CB_SEQUENCE status of zero, but all informations of cb_sequenceres are zero too !!! validate_seqid() return NFS4ERR_RETRY_UNCACHED_REP for a replay, and skip the initlize cb_sequenceres. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-11 21:09:06 -04:00
Jeff Layton	6f02dc88be	nfs: deny backchannel RPCs with an incorrect authflavor instead of dropping them A drop should really only be done when the frame is malformed or we have reason to think that there is some sort of DoS going on. When we get an RPC with bad auth, we should send back an error instead. Cc: Andy Adamson <William.Adamson@netapp.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-11 14:06:34 -04:00
Kinglong Mee	e937ee714b	nfs: Only update callback sequnce id when CB_SEQUENCE success When testing pnfs layout, nfsd got error NFS4ERR_SEQ_MISORDERED. It is caused by nfs return NFS4ERR_DELAY before validate_seqid(), don't update the sequnce id, but nfsd updates the sequnce id !!! According to RFC5661 20.9.3, " If CB_SEQUENCE returns an error, then the state of the slot (sequence ID, cached reply) MUST NOT change. " Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-11 14:00:40 -04:00
Vaishali Thakkar	4ed0d83d05	NFS: Convert use of __constant_htonl to htonl In little endian cases, the macro htonl unfolds to __swab32 which provides special case for constants. In big endian cases, __constant_htonl and htonl expand directly to the same expression. So, replace __constant_htonl with htonl with the goal of getting rid of the definition of __constant_htonl completely. The semantic patch that performs this transformation is as follows: @@expression x;@@ - __constant_htonl(x) + htonl(x) Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-10 18:57:59 -04:00
Anna Schumaker	11598b8ff2	NFS: Remove unused nfs_rw_ops->rw_release() function This was only ever set to nfs_writeback_release_common(), a function which is completely empty. Let's just drop this function pointer and simplify the code a bit. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-10 18:32:40 -04:00
Dominique Martinet	c86c90c656	NFSv4: handle nfs4_get_referral failure nfs4_proc_lookup_common is supposed to return a posix error, we have to handle any error returned that isn't errno Reported-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-10 18:28:02 -04:00
Jeff Layton	3c87ef6efb	sunrpc: keep a count of swapfiles associated with the rpc_clnt Jerome reported seeing a warning pop when working with a swapfile on NFS. The nfs_swap_activate can end up calling sk_set_memalloc while holding the rcu_read_lock and that function can sleep. To fix that, we need to take a reference to the xprt while holding the rcu_read_lock, set the socket up for swapping and then drop that reference. But, xprt_put is not exported and having NFS deal with the underlying xprt is a bit of layering violation anyway. Fix this by adding a set of activate/deactivate functions that take a rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and nfs_swap_deactivate call those. Also, add a per-rpc_clnt atomic counter to keep track of the number of active swapfiles associated with it. When the counter does a 0->1 transition, we enable swapping on the xprt, when we do a 1->0 transition we disable swapping on it. This also allows us to be a bit more selective with the RPC_TASK_SWAPPER flag. If non-swapper and swapper clnts are sharing a xprt, then we only need to flag the tasks from the swapper clnt with that flag. Acked-by: Mel Gorman <mgorman@suse.de> Reported-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-10 18:26:14 -04:00
Trond Myklebust	8eee52af27	NFSv4: nfs4_handle_delegation_recall_error should ignore EAGAIN EAGAIN is a valid return code from nfs4_open_recover(), and should be handled by nfs4_handle_delegation_recall_error by simply passing it through. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-04 13:51:13 -04:00
Sasha Levin	161f873b89	vfs: read file_handle only once in handle_to_path We used to read file_handle twice. Once to get the amount of extra bytes, and once to fetch the entire structure. This may be problematic since we do size verifications only after the first read, so if the number of extra bytes changes in userspace between the first and second calls, we'll have an incoherent view of file_handle. Instead, read the constant size once, and copy that over to the final structure without having to re-read it again. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-02 10:29:07 -07:00
Julia Lawall	13985b1f77	NFS: drop unneeded goto Delete jump to a label on the next line, when that label is not used elsewhere. A simplified version of the semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ identifier l; @@ -if (...) goto l; -l: // </smpl> Also drop the unnecessary ret variable. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-02 08:55:28 -04:00
Chuck Lever	d683cc49da	NFS: Fix size of NFSACL SETACL operations When encoding the NFSACL SETACL operation, reserve just the estimated size of the ACL rather than a fixed maximum. This eliminates needless zero padding on the wire that the server ignores. Fixes: `ee5dc7732b` ('NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-02 08:55:28 -04:00
NeilBrown	7ef5ca4fe4	NFS: report more appropriate block size for directories. In glibc 2.21 (and several previous), a call to opendir() will result in a 32K (BUFSIZ4) buffer being allocated and passed to getdents. However a call to fdopendir() results in an 'fstat' request to determine block size and a matching buffer allocated for subsequent use with getdents. This will typically be 1M. The first getdents call on an NFS directory will always use READDIR_PLUS (or NFSv4 equivalent) if available. Subsequent getdents calls only use this more expensive version if some 'stat' requests are made between the getdents calls. For this reason it is good to keep at least that first getdents call relatively short. When fdopendir() and readdir() is used on a large directory, it takes approximately 32 times as long to complete as using "opendir". Current versions of 'find' use fdopendir() and demonstrate this slowness. 'stat' on a directory currently returns the 'wsize'. This number has no meaning on directories. Actual READDIR requests are limited to ->dtsize, which itself is capped at 4 pages, coincidently the same as BUFSIZ4. So this is a meaningful number to use as the blocksize on directories, and has the effect of making 'find' on large directories go a lot faster. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-02 08:55:27 -04:00
Trond Myklebust	5cae02f427	NFSv4: Always drain the slot table before re-establishing the lease While the NFSv4.1 code has always drained the slot tables in order to stop non-recovery related RPC calls when doing lease recovery, the NFSv4 code did not. The reason for the difference in behaviour is that NFSv4 does not have session state, and so RPC calls can in theory proceed while recovery is happening. In practice, however, anything I/O or state related needs to wait until recovery is over. This patch changes the behaviour of NFSv4 to match that of NFSv4.1 so that we can simplify the state recovery code by assuming that we do not have to deal with races between recovery and ordinary I/O. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-02 08:55:27 -04:00
Olga Kornievskaia	e8d975e73e	fixing infinite OPEN loop in 4.0 stateid recovery Problem: When an operation like WRITE receives a BAD_STATEID, even though recovery code clears the RECLAIM_NOGRACE recovery flag before recovering the open state, because of clearing delegation state for the associated inode, nfs_inode_find_state_and_recover() gets called and it makes the same state with RECLAIM_NOGRACE flag again. As a results, when we restart looking over the open states, we end up in the infinite loop instead of breaking out in the next test of state flags. Solution: unset the RECLAIM_NOGRACE set because of calling of nfs_inode_find_state_and_recover() after returning from calling recover_open() function. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-06-01 09:58:02 -04:00
Linus Torvalds	8ba64dc338	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fix from Al Viro: "Off-by-one in d_walk()/__dentry_kill() race fix. It's very hard to hit; possible in the same conditions as the original bug, except that you need the skipped branch to contain all the remaining evictables, so that the d_walk()-calling loop in d_invalidate() decides there's nothing more to do and doesn't go for another pass - otherwise that next pass will sweep the sucker. So it's not too urgent, but seeing that the fix is obvious and the original commit has spread into all -stable branches..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: d_walk() might skip too much	2015-05-31 16:00:34 -07:00
Linus Torvalds	1be44e234b	xfs: update for 4.1-rc6 Changes in this update: o regression fix for new rename whiteout code o regression fixes for new superblock generic per-cpu counter code o fix for incorrect error return sign introduced in 3.17 o metadata corruption fixes that need to go back to -stable kernels -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJVaO0JAAoJEK3oKUf0dfodd4UQANRdfXnUrpyQGhVS7HFoFoVt FIQ52pPGbMu72+DqHc+Q41uvgAPe65LFB2VUL6CUGCMExstF72F5+QonzppMgkMo unPER3eB8ya03SY+Kp+803ZGgzI2Nl2M6w8Kof730/RUk56PTGYIx4eLXd6iZSli RsYjw8JDbeue5OQo5FPmLCSQ/Kr5ZJXbgWVPyWkKg9aCcXLN5YSJIV3xcMTK9Q2I LqqODkyatnGc6YxGAKddS7Xzt1ntlZgbe5mndQw04a2g0Lf6emPH5r8b0UJXIu96 advOBX0pEbad4FeFS6Mo5D+nNCaaNP4WzN7wgdb+BYNVw3ss4Ebam7+yY6Gexg6y bzZOEkk9saL4YeBDgyYICNu7kG4BRVKRQiiX220G6SFXM3nqbl7qBPb3kVFyDpcI RRuFJ0ZV0kFJ+3IQ4xVnIh6nootceRk/mvZaK5HhLhQLzklpZ8fj4HF3oBDUAnvN wNd+7GoZy7zldjCkbF4BP3GjUeW+b9ngrCNc+bFXi5cUbdECXAa2krjxyY+MlQF2 veNVVcsoRdfeM0VjJh2/piGJxMWIlRqXdKzPKsfMWnlIaJ6YyslfbSq+2K7LxgGR Ho3Sjt0oUuPMZ9F/Mjj+XDqwmzgooUHXNyDBxhGXBNBPjApcRLcb2vQ2SrWEmeGJ vZmC2R1ZoGdBJg8a55BT =w5SP -----END PGP SIGNATURE----- Merge tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "This is a little larger than I'd like late in the release cycle, but all the fixes are for regressions introduced in the 4.1-rc1 merge, or are needed back in -stable kernels fairly quickly as they are filesystem corruption or userspace visible correctness issues. Changes in this update: - regression fix for new rename whiteout code - regression fixes for new superblock generic per-cpu counter code - fix for incorrect error return sign introduced in 3.17 - metadata corruption fixes that need to go back to -stable kernels" * tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: fix broken i_nlink accounting for whiteout tmpfile inode xfs: xfs_iozero can return positive errno xfs: xfs_attr_inactive leaves inconsistent attr fork state behind xfs: extent size hints can round up extents past MAXEXTLEN xfs: inode and free block counters need to use __percpu_counter_compare percpu_counter: batch size aware __percpu_counter_compare() xfs: use percpu_counter_read_positive for mp->m_icount	2015-05-29 16:45:45 -07:00
Al Viro	2159184ea0	d_walk() might skip too much when we find that a child has died while we'd been trying to ascend, we should go into the first live sibling itself, rather than its sibling. Off-by-one in question had been introduced in "deal with deadlock in d_walk()" and the fix needs to be backported to all branches this one has been backported to. Cc: stable@vger.kernel.org # 3.2 and later Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-28 23:45:30 -04:00
Bob Copeland	5a6b2b36a8	omfs: fix potential integer overflow in allocator Both 'i' and 'bits_per_entry' are signed integers but the result is a u64 block number. Cast i to u64 to avoid truncation on 32-bit targets. Found by Coverity (CID 200679). Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-05-28 18:25:19 -07:00
Bob Copeland	c0345ee57d	omfs: fix sign confusion for bitmap loop counter The count variable is used to iterate down to (below) zero from the size of the bitmap and handle the one-filling the remainder of the last partial bitmap block. The loop conditional expects count to be signed in order to detect when the final block is processed, after which count goes negative. Unfortunately, a recent change made this unsigned along with some other related fields. The result of is this is that during mount, omfs_get_imap will overrun the bitmap array and corrupt memory unless number of blocks happens to be a multiple of 8 * blocksize. Fix by changing count back to signed: it is guaranteed to fit in an s32 without overflow due to an enforced limit on the number of blocks in the filesystem. Signed-off-by: Bob Copeland <me@bobcopeland.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-05-28 18:25:19 -07:00
Bob Copeland	3a281f9466	omfs: set error return when d_make_root() fails A static checker found the following issue in the error path for omfs_fill_super: fs/omfs/inode.c:552 omfs_fill_super() warn: missing error code here? 'd_make_root()' failed. 'ret' = '0' Fix by returning -ENOMEM in this case. Signed-off-by: Bob Copeland <me@bobcopeland.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-05-28 18:25:18 -07:00
Sasha Levin	dcbff39da3	fs, omfs: add NULL terminator in the end up the token list match_token() expects a NULL terminator at the end of the token list so that it would know where to stop. Not having one causes it to overrun to invalid memory. In practice, passing a mount option that omfs didn't recognize would sometimes panic the system. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Bob Copeland <me@bobcopeland.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-05-28 18:25:18 -07:00
Andrew Morton	2b1d3ae940	fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings load_elf_binary() returns `retval', not `error'. Fixes: `a87938b2e2` ("fs/binfmt_elf.c: fix bug in loading of PIE binaries") Reported-by: James Hogan <james.hogan@imgtec.com> Cc: Michael Davidson <md@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-05-28 18:25:18 -07:00
Brian Foster	22419ac9fe	xfs: fix broken i_nlink accounting for whiteout tmpfile inode XFS uses the internal tmpfile() infrastructure for the whiteout inode used for RENAME_WHITEOUT operations. For tmpfile inodes, XFS allocates the inode, drops di_nlink, adds the inode to the agi unlinked list, calls d_tmpfile() which correspondingly drops i_nlink of the vfs inode, and then finishes the common inode setup (e.g., clear I_NEW and unlock). The d_tmpfile() call was originally made inxfs_create_tmpfile(), but was pulled up out of that function as part of the following commit to resolve a deadlock issue: `330033d6` xfs: fix tmpfile/selinux deadlock and initialize security As a result, callers of xfs_create_tmpfile() are responsible for either calling d_tmpfile() or fixing up i_nlink appropriately. The whiteout tmpfile allocation helper does neither. As a result, the vfs ->i_nlink becomes inconsistent with the on-disk ->di_nlink once xfs_rename() links it back into the source dentry and calls xfs_bumplink(). Update the assert in xfs_rename() to help detect this problem in the future and update xfs_rename_alloc_whiteout() to decrement the link count as part of the manual tmpfile inode setup. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2015-05-29 08:14:55 +10:00
Dave Chinner	cddc116228	xfs: xfs_iozero can return positive errno It was missed when we converted everything in XFs to use negative error numbers, so fix it now. Bug introduced in 3.17 by commit `2451337` ("xfs: global error sign conversion"), and should go back to stable kernels. Thanks to Brian Foster for noticing it. cc: <stable@vger.kernel.org> # 3.17, 3.18, 3.19, 4.0 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2015-05-29 07:40:32 +10:00
Dave Chinner	6dfe5a049f	xfs: xfs_attr_inactive leaves inconsistent attr fork state behind xfs_attr_inactive() is supposed to clean up the attribute fork when the inode is being freed. While it removes attribute fork extents, it completely ignores attributes in local format, which means that there can still be active attributes on the inode after xfs_attr_inactive() has run. This leads to problems with concurrent inode writeback - the in-core inode attribute fork is removed without locking on the assumption that nothing will be attempting to access the attribute fork after a call to xfs_attr_inactive() because it isn't supposed to exist on disk any more. To fix this, make xfs_attr_inactive() completely remove all traces of the attribute fork from the inode, regardless of it's state. Further, also remove the in-core attribute fork structure safely so that there is nothing further that needs to be done by callers to clean up the attribute fork. This means we can remove the in-core and on-disk attribute forks atomically. Also, on error simply remove the in-memory attribute fork. There's nothing that can be done with it once we have failed to remove the on-disk attribute fork, so we may as well just blow it away here anyway. cc: <stable@vger.kernel.org> # 3.12 to 4.0 Reported-by: Waiman Long <waiman.long@hp.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2015-05-29 07:40:08 +10:00
Dave Chinner	6dea405eee	xfs: extent size hints can round up extents past MAXEXTLEN This results in BMBT corruption, as seen by this test: # mkfs.xfs -f -d size=40051712b,agcount=4 /dev/vdc .... # mount /dev/vdc /mnt/scratch # xfs_io -ft -c "extsize 16m" -c "falloc 0 30g" -c "bmap -vp" /mnt/scratch/foo which results in this failure on a debug kernel: XFS: Assertion failed: (blockcount & xfs_mask64hi(64-BMBT_BLOCKCOUNT_BITLEN)) == 0, file: fs/xfs/libxfs/xfs_bmap_btree.c, line: 211 .... Call Trace: [<ffffffff814cf0ff>] xfs_bmbt_set_allf+0x8f/0x100 [<ffffffff814cf18d>] xfs_bmbt_set_all+0x1d/0x20 [<ffffffff814f2efe>] xfs_iext_insert+0x9e/0x120 [<ffffffff814c7956>] ? xfs_bmap_add_extent_hole_real+0x1c6/0xc70 [<ffffffff814c7956>] xfs_bmap_add_extent_hole_real+0x1c6/0xc70 [<ffffffff814caaab>] xfs_bmapi_write+0x72b/0xed0 [<ffffffff811c72ac>] ? kmem_cache_alloc+0x15c/0x170 [<ffffffff814fe070>] xfs_alloc_file_space+0x160/0x400 [<ffffffff81ddcc29>] ? down_write+0x29/0x60 [<ffffffff815063eb>] xfs_file_fallocate+0x29b/0x310 [<ffffffff811d2bc8>] ? __sb_start_write+0x58/0x120 [<ffffffff811e3e18>] ? do_vfs_ioctl+0x318/0x570 [<ffffffff811cd680>] vfs_fallocate+0x140/0x260 [<ffffffff811ce6f8>] SyS_fallocate+0x48/0x80 [<ffffffff81ddec09>] system_call_fastpath+0x12/0x17 The tracepoint that indicates the extent that triggered the assert failure is: xfs_iext_insert: idx 0 offset 0 block 16777224 count 2097152 flag 1 Clearly indicating that the extent length is greater than MAXEXTLEN, which is 2097151. A prior trace point shows the allocation was an exact size match and that a length greater than MAXEXTLEN was asked for: xfs_alloc_size_done: agno 1 agbno 8 minlen 2097152 maxlen 2097152 ^^^^^^^ ^^^^^^^ We don't see this problem with extent size hints through the IO path because we can't do single IOs large enough to trigger MAXEXTLEN allocation. fallocate(), OTOH, is not limited in it's allocation sizes and so needs help here. The issue is that the extent size hint alignment is rounding up the extent size past MAXEXTLEN, because xfs_bmapi_write() is not taking into account extent size hints when calculating the maximum extent length to allocate. xfs_bmapi_reserve_delalloc() is already doing this, but direct extent allocation is not. Unfortunately, the calculation in xfs_bmapi_reserve_delalloc() is wrong, and it works only because delayed allocation extents are not limited in size to MAXEXTLEN in the in-core extent tree. hence this calculation does not work for direct allocation, and the delalloc code needs fixing. This may, in fact be the underlying bug that occassionally causes transaction overruns in delayed allocation extent conversion, so now we know it's wrong we should fix it, too. Many thanks to Brian Foster for finding this problem during review of this patch. Hence the fix, after much code reading, is to allow xfs_bmap_extsize_align() to align partial extents when full alignment would extend the alignment past MAXEXTLEN. We can safely do this because all callers have higher layer allocation loops that already handle short allocations, and so will simply run another allocation to cover the remainder of the requested allocation range that we ignored during alignment. The advantage of this approach is that it also removes the need for callers to do anything other than limit their requests to MAXEXTLEN - they don't really need to be aware of extent size hints at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2015-05-29 07:40:06 +10:00
Dave Chinner	8c1903d308	xfs: inode and free block counters need to use __percpu_counter_compare Because the counters use a custom batch size, the comparison functions need to be aware of that batch size otherwise the comparison does not work correctly. This leads to ASSERT failures on generic/027 like this: XFS: Assertion failed: 0, file: fs/xfs/xfs_mount.c, line: 1099 ------------[ cut here ]------------ .... Call Trace: [<ffffffff81522a39>] xfs_mod_icount+0x99/0xc0 [<ffffffff815285cb>] xfs_trans_unreserve_and_mod_sb+0x28b/0x5b0 [<ffffffff8152f941>] xfs_log_commit_cil+0x321/0x580 [<ffffffff81528e17>] xfs_trans_commit+0xb7/0x260 [<ffffffff81503d4d>] xfs_bmap_finish+0xcd/0x1b0 [<ffffffff8151da41>] xfs_inactive_ifree+0x1e1/0x250 [<ffffffff8151dbe0>] xfs_inactive+0x130/0x200 [<ffffffff81523a21>] xfs_fs_evict_inode+0x91/0xf0 [<ffffffff811f3958>] evict+0xb8/0x190 [<ffffffff811f433b>] iput+0x18b/0x1f0 [<ffffffff811e8853>] do_unlinkat+0x1f3/0x320 [<ffffffff811d548a>] ? filp_close+0x5a/0x80 [<ffffffff811e999b>] SyS_unlinkat+0x1b/0x40 [<ffffffff81e0892e>] system_call_fastpath+0x12/0x71 This is a regression introduced by commit `501ab32` ("xfs: use generic percpu counters for inode counter"). This patch fixes the same problem for both the inode counter and the free block counter in the superblocks. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2015-05-29 07:39:34 +10:00
George Wang	74f9ce1cf2	xfs: use percpu_counter_read_positive for mp->m_icount Function percpu_counter_read just return the current counter, which can be negative. This will cause the checking of "allocated inode counts <= m_maxicount" false positive. Use percpu_counter_read_positive can solve this problem, and be consistent with the purpose to introduce percpu mechanism to xfs. Signed-off-by: George Wang <xuw2015@gmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2015-05-29 07:39:34 +10:00
Linus Torvalds	de182468d1	Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Back from SambaXP - now have 8 small CIFS bug fixes to merge" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix race condition on RFC1002_NEGATIVE_SESSION_RESPONSE Fix to convert SURROGATE PAIR cifs: potential missing check for posix_lock_file_wait Fix to check Unique id and FileType when client refer file directly. CIFS: remove an unneeded NULL check [cifs] fix null pointer check Fix that several functions handle incorrect value of mapchars cifs: Don't replace dentries for dfs mounts	2015-05-27 14:09:16 -07:00
Linus Torvalds	3cfd4ba7d3	Merge branch 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull two overlayfs fixes from Miklos Szeredi: "Overlayfs rmdir() failed to check for emptiness in one case; this was introduced in 4.0. The other bug was there since day one: failure to mount if upper fs is full, which bit some OpenWRT folks" * 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: mount read-only if workdir can't be created ovl: don't remove non-empty opaque directory	2015-05-27 09:47:57 -07:00
Linus Torvalds	7ce14f6ff2	Merge branch 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "I fixed up a regression from 4.0 where conversion between different raid levels would sometimes bail out without converting. Filipe tracked down a race where it was possible to double allocate chunks on the drive. Mark has a fix for fiemap. All three will get bundled off for stable as well" * 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix regression in raid level conversion Btrfs: fix racy system chunk allocation when setting block group ro btrfs: clear 'ret' in btrfs_check_shared() loop	2015-05-23 11:14:10 -07:00
Federico Sauter	4afe260bab	CIFS: Fix race condition on RFC1002_NEGATIVE_SESSION_RESPONSE This patch fixes a race condition that occurs when connecting to a NT 3.51 host without specifying a NetBIOS name. In that case a RFC1002_NEGATIVE_SESSION_RESPONSE is received and the SMB negotiation is reattempted, but under some conditions it leads SendReceive() to hang forever while waiting for srv_mutex. This, in turn, sets the calling process to an uninterruptible sleep state and makes it unkillable. The solution is to unlock the srv_mutex acquired in the demux thread before going to sleep (after the reconnect error) and before reattempting the connection.	2015-05-20 13:25:55 -05:00

1 2 3 4 5 ...

40483 Commits