linux

Author	SHA1	Message	Date
Jaegeuk Kim	4b2fecc846	f2fs: introduce FITRIM in f2fs_ioctl This patch introduces FITRIM in f2fs_ioctl. In this case, f2fs will issue small discards and prefree discards as many as possible for the given area. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-30 15:06:09 -07:00
Jaegeuk Kim	75ab4cb830	f2fs: introduce cp_control structure This patch add a new data structure to control checkpoint parameters. Currently, it presents the reason of checkpoint such as is_umount and normal sync. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-30 15:01:28 -07:00
Trond Myklebust	72c23f0819	Merge branch 'bugfixes' into linux-next * bugfixes: NFSv4.1: Fix an NFSv4.1 state renewal regression NFSv4: fix open/lock state recovery error handling NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails NFS: Fabricate fscache server index key correctly SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT nfs: fix duplicate proc entries	2014-09-30 17:21:41 -04:00
Andy Adamson	d1f456b0b9	NFSv4.1: Fix an NFSv4.1 state renewal regression Commit `2f60ea6b8c` ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat call, on the wire to renew the NFSv4.1 state if the flag was not set. The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal (cl_last_renewal) plus the lease time divided by 3. This is arbitrary and sometimes does the following: In normal operation, the only way a future state renewal call is put on the wire is via a call to nfs4_schedule_state_renewal, which schedules a nfs4_renew_state workqueue task. nfs4_renew_state determines if the NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence, which only gets sent if the NFS4_RENEW_TIMEOUT flag is set. Then the nfs41_proc_async_sequence rpc_release function schedules another state remewal via nfs4_schedule_state_renewal. Without this change we can get into a state where an application stops accessing the NFSv4.1 share, state renewal calls stop due to the NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover from this situation is with a clientid re-establishment, once the application resumes and the server has timed out the lease and so returns NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation. An example application: open, lock, write a file. sleep for 6 * lease (could be less) ulock, close. In the above example with NFSv4.1 delegations enabled, without this change, there are no OP_SEQUENCE state renewal calls during the sleep, and the clientid is recovered due to lease expiration on the close. This issue does not occur with NFSv4.1 delegations disabled, nor with NFSv4.0, with or without delegations enabled. Signed-off-by: Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com Fixes: `2f60ea6b8c` (NFSv4: The NFSv4.0 client must send RENEW calls...) Cc: stable@vger.kernel.org # 3.2.x Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-30 17:18:42 -04:00
J. Bruce Fields	15b23ef5d3	nfsd4: fix corruption of NFSv4 read data The calculation of page_ptr here is wrong in the case the read doesn't start at an offset that is a multiple of a page. The result is that nfs4svc_encode_compoundres sets rq_next_page to a value one too small, and then the loop in svc_free_res_pages may incorrectly fail to clear a page pointer in rq_respages[]. Pages left in rq_respages[] are available for the next rpc request to use, so xdr data may be written to that page, which may hold data still waiting to be transmitted to the client or data in the page cache. The observed result was silent data corruption seen on an NFSv4 client. We tag this as "fixing" `05638dc73a` because that commit exposed this bug, though the incorrect calculation predates it. Particular thanks to Andrea Arcangeli and David Gilbert for analysis and testing. Fixes: `05638dc73a` "nfsd4: simplify server xdr->next_page use" Cc: stable@vger.kernel.org Reported-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-30 15:57:04 -04:00
Anna Schumaker	24bab49122	NFSD: Implement SEEK This patch adds server support for the NFS v4.2 operation SEEK, which returns the position of the next hole or data segment in a file. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-29 14:35:20 -04:00
Anna Schumaker	87a15a8090	NFSD: Add generic v4.2 infrastructure It's cleaner to introduce everything at once and have the server reply with "not supported" than it would be to introduce extra operations when implementing a specific one in the middle of the list. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-29 14:35:19 -04:00
Trond Myklebust	df817ba357	NFSv4: fix open/lock state recovery error handling The current open/lock state recovery unfortunately does not handle errors such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping, just proceeds as if the state manager is finished recovering. This patch ensures that we loop back, handle higher priority errors and complete the open/lock state recovery. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-28 16:03:04 -04:00
Trond Myklebust	a4339b7b68	NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted a second time, then the client will currently take this to mean that it must declare all locks to be stale, and hence ineligible for reboot recovery. RFC3530 and RFC5661 both suggest that the client should instead rely on the server to respond to inelegible open share, lock and delegation reclaim requests with NFS4ERR_NO_GRACE in this situation. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-28 16:03:03 -04:00
Linus Torvalds	1e3827bf8a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Assorted fixes + unifying __d_move() and __d_materialise_dentry() + minimal regression fix for d_path() of victims of overwriting rename() ported on top of that" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: Don't exchange "short" filenames unconditionally. fold swapping ->d_name.hash into switch_names() fold unlocking the children into dentry_unlock_parents_for_move() kill __d_materialise_dentry() __d_materialise_dentry(): flip the order of arguments __d_move(): fold manipulations with ->d_child/->d_subdirs don't open-code d_rehash() in d_materialise_unique() pull rehashing and unlocking the target dentry into __d_materialise_dentry() ufs: deal with nfsd/iget races fuse: honour max_read and max_write in direct_io mode shmem: fix nlink for rename overwrite directory	2014-09-27 17:05:14 -07:00
Mikhail Efremov	d2fa4a8476	vfs: Don't exchange "short" filenames unconditionally. Only exchange source and destination filenames if flags contain RENAME_EXCHANGE. In case if executable file was running and replaced by other file /proc/PID/exe should still show correct file name, not the old name of the file by which it was replaced. The scenario when this bug manifests itself was like this: * ALT Linux uses rpm and start-stop-daemon; * during a package upgrade rpm creates a temporary file for an executable to rename it upon successful unpacking; * start-stop-daemon is run subsequently and it obtains the (nonexistant) temporary filename via /proc/PID/exe thus failing to identify the running process. Note that "long" filenames (> DNAiME_INLINE_LEN) are still exchanged without RENAME_EXCHANGE and this behaviour exists long enough (should be fixed too apparently). So this patch is just an interim workaround that restores behavior for "short" names as it was before changes introduced by commit `da1ce0670c` ("vfs: add cross-rename"). See https://lkml.org/lkml/2014/9/7/6 for details. AV: the comments about being more careful with ->d_name.hash than with ->d_name.name are from back in 2.3.40s; they became obsolete by 2.3.60s, when we started to unhash the target instead of swapping hash chain positions followed by d_delete() as we used to do when dcache was first introduced. Acked-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: `da1ce0670c` "vfs: add cross-rename" Signed-off-by: Mikhail Efremov <sem@altlinux.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-27 15:59:39 -04:00
Linus Torvalds	a28ddb87cd	fold swapping ->d_name.hash into switch_names() and do it along with ->d_name.len there Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-27 15:59:11 -04:00
Al Viro	986c01942a	fold unlocking the children into dentry_unlock_parents_for_move() ... renaming it into dentry_unlock_for_move() and making it more symmetric with dentry_lock_for_move(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-26 23:11:15 -04:00
Al Viro	63cf427a57	kill __d_materialise_dentry() it folds into __d_move() now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-26 23:06:14 -04:00
Al Viro	4453641fe8	__d_materialise_dentry(): flip the order of arguments ... thus making it much closer to (now unreachable, BTW) IS_ROOT(dentry) case in __d_move(). A bit more and it'll fold in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-26 22:54:02 -04:00
Al Viro	9d8cd306a8	__d_move(): fold manipulations with ->d_child/->d_subdirs list_del() + list_add() is a slightly pessimised list_move() list_del() + INIT_LIST_HEAD() is a slightly pessimised list_del_init() Interleaving those makes the resulting code even worse. And harder to follow... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-26 21:34:01 -04:00
Al Viro	8527dd7187	don't open-code d_rehash() in d_materialise_unique() ... and get rid of duplicate BUG_ON() there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-26 21:26:50 -04:00
Al Viro	5cc3821b57	pull rehashing and unlocking the target dentry into __d_materialise_dentry() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-26 21:25:35 -04:00
Al Viro	e4502c63f5	ufs: deal with nfsd/iget races Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-26 21:17:52 -04:00
Miklos Szeredi	2c80929c4c	fuse: honour max_read and max_write in direct_io mode The third argument of fuse_get_user_pages() "nbytesp" refers to the number of bytes a caller asked to pack into fuse request. This value may be lesser than capacity of fuse request or iov_iter. So fuse_get_user_pages() must ensure that *nbytesp won't grow. Now, when helper iov_iter_get_pages() performs all hard work of extracting pages from iov_iter, it can be done by passing properly calculated "maxsize" to the helper. The other caller of iov_iter_get_pages() (dio_refill_pages()) doesn't need this capability, so pass LONG_MAX as the maxsize argument here. Fixes: `c9c37e2e63` ("fuse: switch to iov_iter_get_pages()") Reported-by: Werner Baumann <werner.baumann@onlinehome.de> Tested-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-26 21:16:51 -04:00
Christoph Hellwig	0162ac2b97	nfsd: introduce nfsd4_callback_ops Add a higher level abstraction than the rpc_ops for callback operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-26 16:29:29 -04:00
Christoph Hellwig	f0b5de1b6b	nfsd: split nfsd4_callback initialization and use Split out initializing the nfs4_callback structure from using it. For the NULL callback this gets rid of tons of pointless re-initializations. Note that I don't quite understand what protects us from running multiple NULL callbacks at the same time, but at least this chance doesn't make it worse.. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-26 16:29:28 -04:00
Christoph Hellwig	326129d02a	nfsd: introduce a generic nfsd4_cb Add a helper to queue up a callback. CB_NULL has a bit of special casing because it is special in the specification, but all other new callback operations will be able to share code with this and a few more changes to refactor the callback code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-26 16:29:27 -04:00
Christoph Hellwig	2faf3b4350	nfsd: remove nfsd4_callback.cb_op We can always get at the private data by using container_of, no need for a void pointer. Also introduce a little to_delegation helper to avoid opencoding the container_of everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-26 16:29:26 -04:00
Benny Halevy	341b51df1f	nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence This is incorrect when a callback is has to be restarted, in which case the XDR decoding of the second iteration will see a NULL cb argument. [hch: updated description] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-26 16:29:25 -04:00
Christoph Hellwig	444b6e910d	nfsd: fix nfsd4_cb_recall_done error handling For any error that is not EBADHANDLE or NFS4ERR_BAD_STATEID, nfsd4_cb_recall_done first marks the connection down, then retries until dl_retries hits zero, then marks the connection down again and sets cb_done. This changes the code to only retry for EBADHANDLE or NFS4ERR_BAD_STATEID, and factors setting cb_done into a single point in the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-26 16:29:25 -04:00
Fabian Frederick	6ff66ac77a	fs/cachefiles: add missing \n to kerror conversions Commit `0227d6abb3` ("fs/cachefiles: replace kerror by pr_err") didn't include newline featuring in original kerror definition Signed-off-by: Fabian Frederick <fabf@skynet.be> Reported-by: David Howells <dhowells@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> [3.16.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-26 08:10:35 -07:00
Peter Feiner	87e6d49a00	mm: softdirty: addresses before VMAs in PTE holes aren't softdirty In PTE holes that contain VM_SOFTDIRTY VMAs, unmapped addresses before VM_SOFTDIRTY VMAs are reported as softdirty by /proc/pid/pagemap. This bug was introduced in commit `68b5a65248` ("mm: softdirty: respect VM_SOFTDIRTY in PTE holes"). That commit made /proc/pid/pagemap look at VM_SOFTDIRTY in PTE holes but neglected to observe the start of VMAs returned by find_vma. Tested: Wrote a selftest that creates a PMD-sized VMA then unmaps the first page and asserts that the page is not softdirty. I'm going to send the pagemap selftest in a later commit. Signed-off-by: Peter Feiner <pfeiner@google.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Jamie Liu <jamieliu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-26 08:10:35 -07:00
Joseph Qi	5760a97c71	ocfs2/dlm: do not get resource spinlock if lockres is new There is a deadlock case which reported by Guozhonghua: https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html This case is caused by &res->spinlock and &dlm->master_lock misordering in different threads. It was introduced by commit `8d400b81cc` ("ocfs2/dlm: Clean up refmap helpers"). Since lockres is new, it doesn't not require the &res->spinlock. So remove it. Fixes: `8d400b81cc` ("ocfs2/dlm: Clean up refmap helpers") Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: joyce.xue <xuejiufei@huawei.com> Reported-by: Guozhonghua <guozhonghua@h3c.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-26 08:10:34 -07:00
Andreas Rohner	56d7acc792	nilfs2: fix data loss with mmap() This bug leads to reproducible silent data loss, despite the use of msync(), sync() and a clean unmount of the file system. It is easily reproducible with the following script: ----------------[BEGIN SCRIPT]-------------------- mkfs.nilfs2 -f /dev/sdb mount /dev/sdb /mnt dd if=/dev/zero bs=1M count=30 of=/mnt/testfile umount /mnt mount /dev/sdb /mnt CHECKSUM_BEFORE="$(md5sum /mnt/testfile)" /root/mmaptest/mmaptest /mnt/testfile 30 10 5 sync CHECKSUM_AFTER="$(md5sum /mnt/testfile)" umount /mnt mount /dev/sdb /mnt CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)" umount /mnt echo "BEFORE MMAP:\t$CHECKSUM_BEFORE" echo "AFTER MMAP:\t$CHECKSUM_AFTER" echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT" ----------------[END SCRIPT]-------------------- The mmaptest tool looks something like this (very simplified, with error checking removed): ----------------[BEGIN mmaptest]-------------------- data = mmap(NULL, file_size - file_offset, PROT_READ \| PROT_WRITE, MAP_SHARED, fd, file_offset); for (i = 0; i < write_count; ++i) { memcpy(data + i * 4096, buf, sizeof(buf)); msync(data, file_size - file_offset, MS_SYNC)) } ----------------[END mmaptest]-------------------- The output of the script looks something like this: BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile So it is clear, that the changes done using mmap() do not survive a remount. This can be reproduced a 100% of the time. The problem was introduced in commit `136e8770cd` ("nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary"). If the page was read with mpage_readpage() or mpage_readpages() for example, then it has no buffers attached to it. In that case page_has_buffers(page) in nilfs_set_page_dirty() will be false. Therefore nilfs_set_file_dirty() is never called and the pages are never collected and never written to disk. This patch fixes the problem by also calling nilfs_set_file_dirty() if the page has no buffers attached to it. [akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/] Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Tested-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-26 08:10:34 -07:00
Joseph Qi	f13a568e5a	ocfs2: free vol_label in ocfs2_delete_osb() osb->vol_label is malloced in ocfs2_initialize_super but not freed if error occurs or during umount, thus causing a memory leak. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-26 08:10:34 -07:00
David Howells	f3f760314a	NFS: Fabricate fscache server index key correctly When fabricating a server index key for fscache, we should clear the index key buffer before starting to fill it in, not in the middle. Reported-by: James Pearson <james-p@moving-picture.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-25 21:25:18 -04:00
Trond Myklebust	3fc3edf141	NFSv3: Fix missing includes of nfs3_fs.h Silence a few warnings about missing symbols that are due to missing includes of nfs3_fs.h. Fixes: `00a36a1090` (NFS: Move v3 declarations out of internal.h) Fixes: `cb8c20fa53` (NFS: Move NFS v3 acl functions to nfs3_fs.h) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-25 16:28:53 -04:00
NeilBrown	1aff525629	NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() Now that nfs_release_page() doesn't block indefinitely, other deadlock avoidance mechanisms aren't needed. - it doesn't hurt for kswapd to block occasionally. If it doesn't want to block it would clear __GFP_WAIT. The current_is_kswapd() was only added to avoid deadlocks and we have a new approach for that. - memory allocation in the SUNRPC layer can very rarely try to ->releasepage() a page it is trying to handle. The deadlock is removed as nfs_release_page() doesn't block indefinitely. So we don't need to set PF_FSTRANS for sunrpc network operations any more. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-25 08:25:47 -04:00
NeilBrown	353db79662	NFS: avoid waiting at all in nfs_release_page when congested. If nfs_release_page() is called on a sequence of pages which are all in the same file which is blocked on COMMIT, each page could contribute a 1 second delay which could be come excessive. I have seen delays of as much as 208 seconds. To keep the delay to one second, mark the bdi as write-congested if the commit didn't finished. Once it does finish, the write-congested flag will be cleared by nfs_commit_release_pages(). With this, the longest total delay in try_to_free_pages that I have seen is under 3 seconds. With no waiting in nfs_release_page at all I have seen delays of nearly 1.5 seconds. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-25 08:25:38 -04:00
NeilBrown	9590544694	NFS: avoid deadlocks with loop-back mounted NFS filesystems. Support for loop-back mounted NFS filesystems is useful when NFS is used to access shared storage in a high-availability cluster. If the node running the NFS server fails, some other node can mount the filesystem and start providing NFS service. If that node already had the filesystem NFS mounted, it will now have it loop-back mounted. nfsd can suffer a deadlock when allocating memory and entering direct reclaim. While direct reclaim does not write to the NFS filesystem it can send and wait for a COMMIT through nfs_release_page(). This patch modifies nfs_release_page() to wait a limited time for the commit to complete - one second. If the commit doesn't complete in this time, nfs_release_page() will fail. This means it might now fail in some cases where it wouldn't before. These cases are only when 'gfp' includes '__GFP_WAIT'. nfs_release_page() is only called by try_to_release_page(), and that can only be called on an NFS page with required 'gfp' flags from - page_cache_pipe_buf_steal() in splice.c - shrink_page_list() in vmscan.c - invalidate_inode_pages2_range() in truncate.c The first two handle failure quite safely. The last is only called after ->launder_page() has been called, and that will have waited for the commit to finish already. So aborting if the commit takes longer than 1 second is perfectly safe. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-25 08:25:28 -04:00
NeilBrown	e87b4c7a7a	NFS: don't use STABLE writes during writeback. commit `b31268ac79` FS: Use stable writes when not doing a bulk flush was a bit heavy handed. The particular problem that lead to this patch was that small writes to an O_SYNC file we being written as UNSTABLE writes followed by a commit. This is appropriate for large writes (which require multiple NFS requests) but for small writes (single NFS request), using NFS_FILE_SYNC is more efficient. So that patch causes the code to select between the two methods depending on how many nfs requests get generated. Unfortunately this ends up applying to non O_SYNC writes as well. In particular if you memory-map a file and update random pages, then when they are eventually written out by writeback they will go as NFS_FILE_SYNC. This is inefficient and slows down the application. So: only set FLUSH_COND_STABLE when wbc->sync_mode is WB_SYNC_ALL. With this patch: O_SYNC writes are NFS_FILE_SYNC for single requests, and NFS_UNSTABLE followed by COMMIT for multiple requests Writing immediately before close of fsync follow the same pattern. Non-O_SYNC writes without an fsync of close eventually get flushed out as UNSTABLE and a commit follows eventually as appropriate. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-24 23:23:02 -04:00
NeilBrown	8478eaa16e	NFSv4: use exponential retry on NFS4ERR_DELAY for async requests. Currently asynchronous NFSv4 request will be retried with exponential timeout (from 1/10 to 15 seconds), but async requests will always use a 15second retry. Some "async" requests are really synchronous though. The async mechanism is used to allow the request to continue if the requesting process is killed. In those cases, an exponential retry is appropriate. For example, if two different clients both open a file and get a READ delegation, and one client then unlinks the file (while still holding an open file descriptor), that unlink will used the "silly-rename" handling which is async. The first rename will result in NFS4ERR_DELAY while the delegation is reclaimed from the other client. The rename will not be retried for 15 seconds, causing an unlink to take 15 seconds rather than 100msec. This patch only added exponential timeout for async unlink and async rename. Other async calls, such as 'close' are sometimes waited for so they might benefit from exponential timeout too. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-24 23:22:47 -04:00
Benjamin Coddington	173b3afcee	lockd: Try to reconnect if statd has moved If rpc.statd is restarted, upcalls to monitor hosts can fail with ECONNREFUSED. In that case force a lookup of statd's new port and retry the upcall. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-24 23:08:43 -04:00
Olga Kornievskaia	8faaa6d5d4	Fixing lease renewal Commit `c9fdeb28` removed a 'continue' after checking if the lease needs to be renewed. However, if client hasn't moved, the code falls down to starting reboot recovery erroneously (ie., sends open reclaim and gets back stale_clientid error) before recovering from getting stale_clientid on the renew operation. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Fixes: `c9fdeb280b` (NFS: Add basic migration support to state manager thread) Cc: stable@vger.kernel.org # 3.13+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-24 23:03:15 -04:00
Fabian Frederick	2f3169fb18	nfs: fix duplicate proc entries Commit `65b38851a1` ("NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes") updated the following function: static int nfs_volume_list_open(struct inode inode, struct file file) it used &nfs_server_list_ops instead of &nfs_volume_list_ops which means cat /proc/fs/nfsfs/volumes = /proc/fs/nfsfs/servers Signed-off-by: Fabian Frederick <fabf@skynet.be> Fixes: `65b38851a1` (NFS: Fix /proc/fs/nfsfs/servers and...) Cc: stable@vger.kernel.org # 3.4.x+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-24 23:00:18 -04:00
Jaegeuk Kim	95dd897301	f2fs: use more free segments until SSR is activated Previously, f2fs activates SSR if the # of free segments reaches to the # of overprovisioned segments. In this case, SSR starts to use dirty segments only, so that the overprovisoned space cannot be selected for new data. This means that we have no chance to utilizae the overprovisioned space at all. This patch fixes that by allowing LFS allocations until the # of free segments reaches to the last threshold, reserved space. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:24 -07:00
Jaegeuk Kim	9b5f136fd4	f2fs: change the ipu_policy option to enable combinations This patch changes the ipu_policy setting to use any combination of orthogonal policies. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:24 -07:00
Chao Yu	210f41bc04	f2fs: fix to search whole dirty segmap when get_victim In ->get_victim we get max_search value from dirty_i->nr_dirty without protection of seglist_lock, after that, nr_dirty can be increased/decreased before we hold seglist_lock lock. Then in main loop we attempt to traverse all dirty section one time to find victim section, but it's not accurate to use max_search as the total loop count, because we might lose checking several sections or check sections redundantly for the case of nr_dirty are increased or decreased previously. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:23 -07:00
Chao Yu	26666c8a43	f2fs: fix to clean previous mount option when remount_fs In manual of mount, we descript remount as below: "mount -o remount,rw /dev/foo /dir After this call all old mount options are replaced and arbitrary stuff from fstab is ignored, except the loop= option which is internally generated and maintained by the mount command." Previously f2fs do not clear up old mount options when remount_fs, so we have no chance of disabling previous option (e.g. flush_merge). Fix it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:22 -07:00
Chao Yu	14cecc5cd6	f2fs: skip punching hole in special condition Now punching hole in directory is not supported in f2fs, so let's limit file type in punch_hole(). In addition, in punch_hole if offset is exceed file size, we should skip punching hole. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:21 -07:00
Chao Yu	55cf9cb63f	f2fs: support large sector size Block size in f2fs is 4096 bytes, so theoretically, f2fs can support 4096 bytes sector device at maximum. But now f2fs only support 512 bytes size sector, so block device such as zRAM which uses page cache as its block storage space will not be mounted successfully as mismatch between sector size of zRAM and sector size of f2fs supported. In this patch we support large sector size in f2fs, so block device with sector size of 512/1024/2048/4096 bytes can be supported in f2fs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:20 -07:00
Chao Yu	09db6a2ef8	f2fs: fix to truncate blocks past EOF in ->setattr By using FALLOC_FL_KEEP_SIZE in ->fallocate of f2fs, we can fallocate block past EOF without changing i_size of inode. These blocks past EOF will not be truncated in ->setattr as we truncate them only when change the file size. We should give a chance to truncate blocks out of filesize in setattr(). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:20 -07:00
Jaegeuk Kim	976e4c50ae	f2fs: update i_size when __allocate_data_block The f2fs_direct_IO uses __allocate_data_block, but inside the allocation path, we should update i_size at the changed time to update its inode page. Otherwise, we can get wrong i_size after roll-forward recovery. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:19 -07:00
Jaegeuk Kim	90a893c749	f2fs: use MAX_BIO_BLOCKS(sbi) This patch cleans up a simple macro. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:18 -07:00
Jaegeuk Kim	c52e1b10b1	f2fs: remove redundant operation during roll-forward recovery If same data is updated multiple times, we don't need to redo whole the operations. Let's just update the lastest one. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:17 -07:00
Jaegeuk Kim	19c9c466e5	f2fs: do not skip latest inode information In f2fs_sync_file, if there is no written appended writes, it skips to write its node blocks. But, if there is up-to-date inode page, we should write it to update its metadata during the roll-forward recovery. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:16 -07:00
Jaegeuk Kim	441ac5cb32	f2fs: fix roll-forward missing scenarios We can summarize the roll forward recovery scenarios as follows. [Term] F: fsync_mark, D: dentry_mark 1. inode(x) \| CP \| inode(x) \| dnode(F) -> Update the latest inode(x). 2. inode(x) \| CP \| inode(F) \| dnode(F) -> No problem. 3. inode(x) \| CP \| dnode(F) \| inode(x) -> Recover to the latest dnode(F), and drop the last inode(x) 4. inode(x) \| CP \| dnode(F) \| inode(F) -> No problem. 5. CP \| inode(x) \| dnode(F) -> The inode(DF) was missing. Should drop this dnode(F). 6. CP \| inode(DF) \| dnode(F) -> No problem. 7. CP \| dnode(F) \| inode(DF) -> If f2fs_iget fails, then goto next to find inode(DF). 8. CP \| dnode(F) \| inode(x) -> If f2fs_iget fails, then goto next to find inode(DF). But it will fail due to no inode(DF). So, this patch adds some missing points such as #1, #5, #7, and #8. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:16 -07:00
Jaegeuk Kim	88bd02c947	f2fs: fix conditions to remain recovery information in f2fs_sync_file This patch revisited whole the recovery information during the f2fs_sync_file. In this patch, there are three information to make a decision. a) IS_CHECKPOINTED, /* is it checkpointed before? / b) HAS_FSYNCED_INODE, / is the inode fsynced before? / c) HAS_LAST_FSYNC, / has the latest node fsync mark? */ And, the scenarios for our rule are based on: [Term] F: fsync_mark, D: dentry_mark 1. inode(x) \| CP \| inode(x) \| dnode(F) 2. inode(x) \| CP \| inode(F) \| dnode(F) 3. inode(x) \| CP \| dnode(F) \| inode(x) \| inode(F) 4. inode(x) \| CP \| dnode(F) \| inode(F) 5. CP \| inode(x) \| dnode(F) \| inode(DF) 6. CP \| inode(DF) \| dnode(F) 7. CP \| dnode(F) \| inode(DF) 8. CP \| dnode(F) \| inode(x) \| inode(DF) For example, #3, the three conditions should be changed as follows. inode(x) \| CP \| dnode(F) \| inode(x) \| inode(F) a) x o o o o b) x x x x o c) x o o x o If f2fs_sync_file stops ------^, it should write inode(F) --------------^ So, the need_inode_block_update should return true, since c) get_nat_flag(e, HAS_LAST_FSYNC), is false. For example, #8, CP \| alloc \| dnode(F) \| inode(x) \| inode(DF) a) o x x x x b) x x x o c) o o x o If f2fs_sync_file stops -------^, it should write inode(DF) --------------^ Note that, the roll-forward policy should follow this rule, which means, if there are any missing blocks, we doesn't need to recover that inode. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:15 -07:00
Jaegeuk Kim	7ef35e3b9e	f2fs: introduce a flag to represent each nat entry information This patch introduces a flag in the nat entry structure to merge various information such as checkpointed and fsync_done marks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:14 -07:00
Jaegeuk Kim	4c521f493b	f2fs: use meta_inode cache to improve roll-forward speed Previously, all the dnode pages should be read during the roll-forward recovery. Even worsely, whole the chain was traversed twice. This patch removes that redundant and costly read operations by using page cache of meta_inode and readahead function as well. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:12 -07:00
Linus Torvalds	e2519c2c85	FS-Cache fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIVAwUAVB/u8ROxKuMESys7AQJGxA/9Ep4IwIAclMYcVQtk4zzvz+A9jVw0pu+X aItU70GcGoxehGlOxyqKXp/dg3CZ7ZPcxmQqzl7nzCZY28or5z/Nu6Y/YII+V0NB f6vgHPhfGO7CSSdwlSzFY/6DdU/EXhjA3X4suowRPIO3B/2ydLxDe3sahIWY/AhM 4h3ioDR+A4ZZvU8fmw62l9Iae46mkk7KT8Am/2bUJQpENu4caVEtNtyJ2IsiPtZ8 pddgvPL/f+piz4Ufdmc/XH4KB8kiLwU+7t3xfoZnSzYz9Vzcq4sQVvSxqNSaYzSC REEnELuLEweIh57PP67giZGPE6MA50kISH/soc43yl/oWj9i3cqkNb2JW+nxpvfS pIHfgRmzqUPe9MrmVUtle9c1HhjAyts3y27JB4onpLghN96JgIcTWJ1vZxnxbXzp ADVCgc3z8nhttLQMU+NsfD4fHUvj+bxULW1bSPdB5DfSrkpqmQJZBRexeFZWGyA9 p1p9Yaty2Sdq/RPmtIfMPA3cu1D9LG4wkvGPmmEAjnIlKebRBjriKDmCdwf9okWA m+fAbuBnTjc+R0ERE6Lh5OI72hbwcgyAyDx5RdWkXVIPMBuBK67nR6nJK7vPW4x0 qK0B5Pk2iwobKQMicVSxI+GImeeiQLi798KHJwN4uFBy74rIISWilIsylBrxlxK8 s2hdI83XJJI= =ojl4 -----END PGP SIGNATURE----- Merge tag 'fscache-fixes-20140917' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fs-cache fixes from David Howells: - Put a timeout in releasepage() to deal with a recursive hang between the memory allocator, writeback, ext4 and fscache under memory pressure. - Fix a pair of refcount bugs in the fscache error handling. - Remove a couple of unused pagevecs. - The cachefiles requirement that the base directory support rename should permit rename2 as an alternative - otherwise certain filesystems cannot now be used as backing stores (such as ext4). * tag 'fscache-fixes-20140917' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: CacheFiles: Handle rename2 cachefiles: remove two unused pagevecs. FS-Cache: refcount becomes corrupt under vma pressure. FS-Cache: Reduce cookie ref count if submit fails. FS-Cache: Timeout for releasepage()	2014-09-22 17:52:16 -07:00
Anton Altaparmakov	f2d5a94436	Fix nasty 32-bit overflow bug in buffer i/o code. On 32-bit architectures, the legacy buffer_head functions are not always handling the sector number with the proper 64-bit types, and will thus fail on 4TB+ disks. Any code that uses __getblk() (and thus bread(), breadahead(), sb_bread(), sb_breadahead(), sb_getblk()), and calls it using a 64-bit block on a 32-bit arch (where "long" is 32-bit) causes an inifinite loop in __getblk_slow() with an infinite stream of errors logged to dmesg like this: __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648 b_state=0x00000020, b_size=512 device sda1 blocksize: 512 Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the top 32-bits are missing (in this case the 0x1 at the top). This is because grow_dev_page() is broken and has a 32-bit overflow due to shifting the page index value (a pgoff_t - which is just 32 bits on 32-bit architectures) left-shifted as the block number. But the top bits to get lost as the pgoff_t is not type cast to sector_t / 64-bit before the shift. This patch fixes this issue by type casting "index" to sector_t before doing the left shift. Note this is not a theoretical bug but has been seen in the field on a 4TiB hard drive with logical sector size 512 bytes. This patch has been verified to fix the infinite loop problem on 3.17-rc5 kernel using a 4TB disk image mounted using "-o loop". Without this patch doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop 100% reproducibly whilst with the patch it works fine as expected. Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-22 08:41:16 -07:00
Trond Myklebust	5466112f09	pnfs/blocklayout: Fix a 64-bit division/remainder issue in bl_map_stripe kbuild test robot reports: fs/built-in.o: In function `bl_map_stripe': >> :(.text+0x965b4): undefined reference to `__aeabi_uldivmod' >> :(.text+0x965cc): undefined reference to `__aeabi_uldivmod' >> :(.text+0x96604): undefined reference to `__aeabi_uldivmod' Fixes: `5c83746a0c` (pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing) Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-21 14:20:20 -04:00
Linus Torvalds	46be7b73e8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "I've got a revert to fix a regression with btrfs device registration, and Filipe has part two of his fsync fix from last week" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Revert "Btrfs: device_list_add() should not update list when mounted" Btrfs: set inode's logged_trans/last_log_commit after ranged fsync	2014-09-19 13:10:53 -07:00
Linus Torvalds	81770f4144	NFS client fixes for 3.17 Highligts: - Fix an Oops in nfs4_open_and_get_state - Fix an Oops in the nfs4_state_manager - Fix another bug in the close/open_downgrade code -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUHIRLAAoJEGcL54qWCgDykZkP/jHDs/0HcK3x8jW+zbxKP6tf xfyhJGySwnTo2v0UPD1pETtQke9bWnm38RVl04wf2H4Gb7jR/BoDZ5J1C7956vuN FFXt9lcnTj2Cijn/8wz2S9GneY/mjsWf9OP7NUM3O6DgxORhdoviOnYOAzqzEXjG ylqTP/3FVglDbawKaLy3ubI0dteNxOu9U4gLveP617Ysd8h4s5XsYHPYKOOltybS HhVNf/3EdoD3lms67Zj7yPl7PtdDhNKFrS32nhnfdLLgsMiwTyb9ZYaFpK2XcD9v KDKblibH/wpQCsnReB66dKBR8P4ktTvXM1ovkb7LFUZD5tsOcb1Bp5ROzHXUSmiI sXh5Ueue0FPKExU5WFKROE43+G5KOJG5pB2RwgugsqVlZjFhGotZrIle17Zuqxz0 kVR+vGZ50O/nLQ+EoRhDRRbDBrUMT7/xxHDSPQ6d4HK2hNTbrXuSXcoe8/BvbSTt JXQCdbWDPZ5oR6z8RoBN1xHhJvXC3Y2w/d7ZzOpl3yLzsKpJ7K4tys4Z29iv3ut6 ziRS1AvJvedwSK73fWTs+zEHKm+pFMqq2U+DncvWWOWOVpIv6eKRlY9O8enP7IeW qNHj4UVYnr9w4oAhvk2WJt1TZrhzBX9NhMjHSxUCSOs5v/YeiBjPTTy40N4O0Y6Z DwKwDNxZq49ILEznntsd =EOW6 -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.17-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client fixes from Trond Myklebust: "Highligts: - fix an Oops in nfs4_open_and_get_state - fix an Oops in the nfs4_state_manager - fix another bug in the close/open_downgrade code" * tag 'nfs-for-3.17-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Fix another bug in the close/open_downgrade code NFSv4: nfs4_state_manager() vs. nfs_server_remove_lists() NFS: remove BUG possibility in nfs4_open_and_get_state	2014-09-19 13:07:49 -07:00
Linus Torvalds	d9773ceabf	Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs/smb3 fixes from Steve French: "Fixes for problems found during testing and debugging at the SMB3 storage test event (plugfest) this week" * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: Fix mfsymlinks file size check Update version number displayed by modinfo for cifs.ko cifs: remove dead code Revert "cifs: No need to send SIGKILL to demux_thread during umount" [SMB3] Fix oops when creating symlinks on smb3 [CIFS] Fix setting time before epoch (negative time values)	2014-09-18 11:10:35 -07:00
Trond Myklebust	cd9288ffae	NFSv4: Fix another bug in the close/open_downgrade code James Drew reports another bug whereby the NFS client is now sending an OPEN_DOWNGRADE in a situation where it should really have sent a CLOSE: the client is opening the file for O_RDWR, but then trying to do a downgrade to O_RDONLY, which is not allowed by the NFSv4 spec. Reported-by: James Drews <drews@engr.wisc.edu> Link: http://lkml.kernel.org/r/541AD7E5.8020409@engr.wisc.edu Fixes: `aee7af356e` (NFSv4: Fix problems with close in the presence...) Cc: stable@vger.kernel.org # 2.6.33+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-18 13:04:22 -04:00
Steve Dickson	080af20cc9	NFSv4: nfs4_state_manager() vs. nfs_server_remove_lists() There is a race between nfs4_state_manager() and nfs_server_remove_lists() that happens during a nfsv3 mount. The v3 mount notices there is already a supper block so nfs_server_remove_lists() called which uses the nfs_client_lock spin lock to synchronize access to the client list. At the same time nfs4_state_manager() is running through the client list looking for work to do, using the same lock. When nfs4_state_manager() wins the race to the list, a v3 client pointer is found and not ignored properly which causes the panic. Moving some protocol checks before the state checking avoids the panic. CC: Stable Tree <stable@vger.kernel.org> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-18 13:04:21 -04:00
Chris Mason	0f23ae74f5	Revert "Btrfs: device_list_add() should not update list when mounted" This reverts commit `b96de000bc`. This commit is triggering failures to mount by subvolume id in some configurations. The main problem is how many different ways this scanning function is used, both for scanning while mounted and unmounted. A proper cleanup is too big for late rcs. For now, just revert the commit and we'll put a better fix into a later merge window. Signed-off-by: Chris Mason <clm@fb.com>	2014-09-18 07:49:05 -07:00
David Howells	e2cf1f1cc7	CacheFiles: Handle rename2 Not all filesystems now provide the rename i_op - ext4 for one - but rather provide the rename2 i_op. CacheFiles checks that the filesystem has rename and so will reject ext4 now with EPERM: CacheFiles: Failed to register: -1 Fix this by checking for rename2 as an alternative. The call to vfs_rename() actually handles selection of the appropriate function, so we needn't worry about that. Turning on debugging shows: [cachef] ==> cachefiles_get_directory(,,cache) [cachef] subdir -> ffff88000b22b778 positive [cachef] <== cachefiles_get_directory() = -1 [check] where -1 is EPERM. Signed-off-by: David Howells <dhowells@redhat.com>	2014-09-17 23:29:53 +01:00
NeilBrown	696382f938	cachefiles: remove two unused pagevecs. These two have been unused since commit `c4d6d8dbf3` CacheFiles: Fix the marking of cached pages in 3.8. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: David Howells <dhowells@redhat.com>	2014-09-17 23:29:50 +01:00
Milosz Tanski	3e1199dcad	FS-Cache: refcount becomes corrupt under vma pressure. In rare cases under heavy VMA pressure the ref count for a fscache cookie becomes corrupt. In this case we decrement ref count even if we fail before incrementing the refcount. FS-Cache: Assertion failed bnode-eca5f9c6/syslog 0 > 0 is false ------------[ cut here ]------------ kernel BUG at fs/fscache/cookie.c:519! invalid opcode: 0000 [#1] SMP Call Trace: [<ffffffffa01ba060>] __fscache_relinquish_cookie+0x50/0x220 [fscache] [<ffffffffa02d64ce>] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph] [<ffffffffa02ae1d3>] ceph_destroy_inode+0x33/0x200 [ceph] [<ffffffff811cf67e>] ? __fsnotify_inode_delete+0xe/0x10 [<ffffffff811a9e0c>] destroy_inode+0x3c/0x70 [<ffffffff811a9f51>] evict+0x111/0x180 [<ffffffff811aa763>] iput+0x103/0x190 [<ffffffff811a5de8>] __dentry_kill+0x1c8/0x220 [<ffffffff811a5f31>] shrink_dentry_list+0xf1/0x250 [<ffffffff811a762c>] prune_dcache_sb+0x4c/0x60 [<ffffffff811930af>] super_cache_scan+0xff/0x170 [<ffffffff8113d7a0>] shrink_slab_node+0x140/0x2c0 [<ffffffff8113f2da>] shrink_slab+0x8a/0x130 [<ffffffff81142572>] balance_pgdat+0x3e2/0x5d0 [<ffffffff811428ca>] kswapd+0x16a/0x4a0 [<ffffffff810a43f0>] ? __wake_up_sync+0x20/0x20 [<ffffffff81142760>] ? balance_pgdat+0x5d0/0x5d0 [<ffffffff81083e09>] kthread+0xc9/0xe0 [<ffffffff81010000>] ? ftrace_raw_event_xen_mmu_release_ptpage+0x70/0x90 [<ffffffff81083d40>] ? flush_kthread_worker+0xb0/0xb0 [<ffffffff8159f63c>] ret_from_fork+0x7c/0xb0 [<ffffffff81083d40>] ? flush_kthread_worker+0xb0/0xb0 RIP [<ffffffffa01b984b>] __fscache_disable_cookie+0x1db/0x210 [fscache] RSP <ffff8803bc85f9b8> ---[ end trace 254d0d7c74a01f25 ]--- Signed-off-by: Milosz Tanski <milosz@adfin.com> Signed-off-by: David Howells <dhowells@redhat.com>	2014-09-17 22:41:40 +01:00
J. Bruce Fields	70b2823535	nfsd4: clarify how grace period ends The grace period is ended in two steps--first userland is notified that the grace period is now long enough that any clients who have not yet reclaimed can be safely forgotten, then we flip the switch that forbids reclaims and allows new opens. I had to think a bit to convince myself that the ordering was right here. Document it. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-17 16:33:19 -04:00
J. Bruce Fields	bea57fe45b	nfsd4: stop grace_time update at end of grace period The attempt to automatically set a new grace period time at the end of the grace period isn't really helpful. We'll probably shut down and reboot before we actually make use of the new grace period time anyway. So may as well leave it up to the init system to get this right. This just confuses people when they see /proc/fs/nfsd/nfsv4gracetime change from what they set it to. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2014-09-17 16:33:18 -04:00
Jeff Layton	65decb650a	nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients In the case of v4.0 clients, we may call into the "create" client tracking operation multiple times (once for each openowner). Upcalling for each one of those is wasteful and slow however. We can skip doing further "create" operations after the first one if we know that one has already been done. v4.1+ clients generally only call into this function once (on RECLAIM_COMPLETE), and we can't skip upcalling on the create even if the STABLE bit is set. Doing so would make it impossible for nfsdcltrack to lift the grace period early since the timestamp has a different meaning in the case where the client is expected to issue a RECLAIM_COMPLETE. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:17 -04:00
Jeff Layton	788a7914ad	nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls The nfsdcltrack upcall doesn't utilize the NFSD4_CLIENT_STABLE flag, which basically results in an upcall every time we call into the client tracking ops. Change it to set this bit on a successful "check" or "create" request, and clear it on a "remove" request. Also, check to see if that bit is set before upcalling on a "check" or "remove" request, and skip upcalling appropriately, depending on its state. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:17 -04:00
Jeff Layton	d682e750ce	nfsd: serialize nfsdcltrack upcalls for a particular client In a later patch, we want to add a flag that will allow us to reduce the need for upcalls. In order to handle that correctly, we'll need to ensure that racing upcalls for the same client can't occur. In practice it should be rare for this to occur with a well-behaved client, but it is possible. Convert one of the bits in the cl_flags field to be an upcall bitlock, and use it to ensure that upcalls for the same client are serialized. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:16 -04:00
Jeff Layton	d4318acd5d	nfsd: pass extra info in env vars to upcalls to allow for early grace period end In order to support lifting the grace period early, we must tell nfsdcltrack what sort of client the "create" upcall is for. We can't reliably tell if a v4.0 client has completed reclaiming, so we can only lift the grace period once all the v4.1+ clients have issued a RECLAIM_COMPLETE and if there are no v4.0 clients. Also, in order to lift the grace period, we have to tell userland when the grace period started so that it can tell whether a RECLAIM_COMPLETE has been issued for each client since then. Since this is all optional info, we pass it along in environment variables to the "init" and "create" upcalls. By doing this, we don't need to revise the upcall format. The UMH upcall can simply make use of this info if it happens to be present. If it's not then it can just avoid lifting the grace period early. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:15 -04:00
Jeff Layton	7f5ef2e900	nfsd: add a v4_end_grace file to /proc/fs/nfsd Allow a privileged userland process to end the v4 grace period early. Writing "Y", "y", or "1" to the file will cause the v4 grace period to be lifted. The basic idea with this will be to allow the userland client tracking program to lift the grace period once it knows that no more clients will be reclaiming state. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:14 -04:00
Jeff Layton	d68e3c4aa4	lockd: add a /proc/fs/lockd/nlm_end_grace file Add a new procfile that will allow a (privileged) userland process to end the NLM grace period early. The basic idea here will be to have sm-notify write to this file, if it sent out no NOTIFY requests when it runs. In that situation, we can generally expect that there will be no reclaim requests so the grace period can be lifted early. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:13 -04:00
Jeff Layton	3b3e7b7223	nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE As stated in RFC 5661, section 18.51.3: Once a RECLAIM_COMPLETE is done, there can be no further reclaim operations for locks whose scope is defined as having completed recovery. Once the client sends RECLAIM_COMPLETE, the server will not allow the client to do subsequent reclaims of locking state for that scope and, if these are attempted, will return NFS4ERR_NO_GRACE. Ensure that we enforce that requirement. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:13 -04:00
Jeff Layton	919b8049f0	nfsd: remove redundant boot_time parm from grace_done client tracking op Since it's stored in nfsd_net, we don't need to pass it in separately. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:12 -04:00
Jeff Layton	f779002965	lockd: move lockd's grace period handling into its own module Currently, all of the grace period handling is part of lockd. Eventually though we'd like to be able to build v4-only servers, at which point we'll need to put all of this elsewhere. Move the code itself into fs/nfs_common and have it build a grace.ko module. Then, rejigger the Kconfig options so that both nfsd and lockd enable it automatically. Signed-off-by: Jeff Layton <jlayton@primarydata.com>	2014-09-17 16:33:11 -04:00
Filipe Manana	125c4cf9f3	Btrfs: set inode's logged_trans/last_log_commit after ranged fsync When a ranged fsync finishes if there are still extent maps in the modified list, still set the inode's logged_trans and last_log_commit. This is important in case an inode is fsync'ed and unlinked in the same transaction, to ensure its inode ref gets deleted from the log and the respective dentries in its parent are deleted too from the log (if the parent directory was fsync'ed in the same transaction). Instead make btrfs_inode_in_log() return false if the list of modified extent maps isn't empty. This is an incremental on top of the v4 version of the patch: "Btrfs: fix fsync data loss after a ranged fsync" which was added to its v5, but didn't make it on time. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-09-16 16:12:19 -07:00
Linus Torvalds	37504a3be9	Here are a number of small fixes for GFS2. There is a fix for FIEMAP on large sparse files, a negative dentry hashing fix, a fix for flock, and a bug fix relating to d_splice_alias usage. There are also (patches 1 and 5) a couple of updates which are less critical, but small and low risk. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQIcBAABAgAGBQJUGAT8AAoJEMrg3m4a/8jSevIQAJh0sfUAzDZ2p1etZw2OR2bm 0ritvFVkOB4CZ97TS7DvItoqcke+e2ehT8ykIzCNZhJotld7jsitzuQm0ugsMvG2 ltrUil2eqyeBFt1dJk9m366+akbdocVhSpVL/6hLRJI9GTbfWr26jHwUTIQisq5u Pdr42lLwwAPrj9D7GQRA1jSZLbA36teRGUuV+3CFxAWVgqG0kOmq7iGd27LZZ/hP m4hjPvdczTfhM3kjNvohicLXQiXhsJdlOZsZKfMTSGmJ2WsMRKoK2PZx2eRmDEEe ypfkboyfcHCJlbIx+x0qPRheYMC8a2TX3JMb3l6eaV0mZoeExB2kPHeBlfu1jddp XwLvbs/fbfC96Q6WPKM3JdBDY9DBf5t30Is0VxNJeMH2ERIubaLrOACeD4P7V5mX PIc3/bcDtNMOirgdd2NGxGR0TxmyhKs7pbtWC+e6Wud04qgsK1JtU8cNro0j8MRE Rfay0RY+8OjHfDn+shQYl4mh8MF2u3JF3/ThcZjtDfAB0AIhF1Bs5Ihl12B45ov1 7ah0J+8Yk7gwH++KDFNv6XGNzjpjkbBEr9FGLLN3DNDysk4ibEYP3NwEf0Vnme7A EhYv0IFgUbgdLrU3owA7rRh38A8W23cr78qRSM66/TEGYqBi9+4/DwLE4EIorGYA mKSpLXhcNtCIrQGNCz3Q =5NIi -----END PGP SIGNATURE----- Merge tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes Pull gfs2 fixes from Steven Whitehouse: "Here are a number of small fixes for GFS2. There is a fix for FIEMAP on large sparse files, a negative dentry hashing fix, a fix for flock, and a bug fix relating to d_splice_alias usage. There are also (patches 1 and 5) a couple of updates which are less critical, but small and low risk" * tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes: GFS2: fix d_splice_alias() misuses GFS2: Don't use MAXQUOTAS value GFS2: Hash the negative dentry during inode lookup GFS2: Request demote when a "try" flock fails GFS2: Change maxlen variables to size_t GFS2: fs/gfs2/super.c: replace seq_printf by seq_puts	2014-09-16 07:47:04 -07:00
James Hogan	a060dc5010	vfs: workaround gcc <4.6 build error in link_path_walk() Commit `d6bb3e9075` ("vfs: simplify and shrink stack frame of link_path_walk()") introduced build problems with GCC versions older than 4.6 due to the initialisation of a member of an anonymous union in struct qstr without enclosing braces. This hits GCC bug 10676 [1] (which was fixed in GCC 4.6 by [2]), and causes the following build error: fs/namei.c: In function 'link_path_walk': fs/namei.c:1778: error: unknown field 'hash_len' specified in initializer This is worked around by adding explicit braces. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676 [2] https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=159206 Fixes: `d6bb3e9075` (vfs: simplify and shrink stack frame of link_path_walk()) Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-metag@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-16 07:44:54 -07:00
Steve French	364d42930d	Fix mfsymlinks file size check If the mfsymlinks file size has changed (e.g. the file no longer represents an emulated symlink) we were not returning an error properly. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Stefan Metzmacher <metze@samba.org>	2014-09-16 06:48:20 -05:00
Jaegeuk Kim	60979115a6	f2fs: fix double lock for inode page during roll-foward recovery If the inode is same and its data index are needed to truncate, we can fall into double lock for its inode page via get_dnode_of_data. Error case is like this. 1. write data 1, 2, 3, 4, 5 in inode #4. 2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4. 3. sync 4. update data 100->106 in dnode #6. 5. fsync inode #4. 6. power-cut -> Then, 1. go back to #3's checkpoint 2. in do_recover_data, get_dnode_of_data() gets inode #4. 3. detect 100->106 in dnode #6. 4. check_index_in_prev_nodes tries to truncate 100 in dnode #6. 5. to trigger truncate_hole, get_dnode_of_data should grab inode #4. 6. detect kernel hang This patch should resolve that bug. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-16 04:10:47 -07:00
Huang Ying	c6e489305e	f2fs: fix a race condition in next_free_nid The nm_i->fcnt checking is executed before spin_lock, so if another thread delete the last free_nid from the list, the wrong nid may be gotten. So fix the race condition by moving the nm_i->fnct checking into spin_lock. Signed-off-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-16 04:10:46 -07:00
Huang Ying	7704182387	f2fs: use nm_i->next_scan_nid as default for next_free_nid Now, if there is no free nid in nm_i->free_nid_list, 0 may be saved into next_free_nid of checkpoint, this may cause useless scanning for next mount. nm_i->next_scan_nid should be a better default value than 0. Signed-off-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-16 04:10:45 -07:00
Jaegeuk Kim	c1ce1b02bb	f2fs: give an option to enable in-place-updates during fsync to users If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file only starts to try in-place-updates. And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it keeps out-of-order manner. Otherwise, it triggers in-place-updates. This may be used by storage showing very high random write performance. For example, it can be used when, Seq. writes (Data) + wait + Seq. writes (Node) is pretty much slower than, Rand. writes (Data) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-16 04:10:44 -07:00
Jaegeuk Kim	a7ffdbe22c	f2fs: expand counting dirty pages in the inode page cache Previously f2fs only counts dirty dentry pages, but there is no reason not to expand the scope. This patch changes the names on the management of dirty pages and to count dirty pages in each inode info as well. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-16 04:10:39 -07:00
Steve French	69af38dbc5	Update version number displayed by modinfo for cifs.ko Update cifs.ko version to 2.05 Signed-off-by: Steve French <smfrench@gmail.com>w	2014-09-16 05:31:01 -05:00
Arnd Bergmann	116ae5e2b0	cifs: remove dead code cifs provides two dummy functions 'sess_auth_lanman' and 'sess_auth_kerberos' for the case in which the respective features are not defined. However, the caller is also under an #ifdef, so we just get warnings about unused code: fs/cifs/sess.c:1109:1: warning: 'sess_auth_kerberos' defined but not used [-Wunused-function] sess_auth_kerberos(struct sess_data *sess_data) Removing the dead functions gets rid of the warnings without any downsides that I can see. (Yalin Wang reported the identical problem and fix so added him) Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-09-16 05:30:11 -05:00
Steve French	a5c3e1c725	Revert "cifs: No need to send SIGKILL to demux_thread during umount" This reverts commit `52a3624444`. Causes rmmod to fail for at least 7 seconds after unmount which makes automated testing a little harder when reloading cifs.ko between test runs. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> CC: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Steve French <smfrench@gmail.com>	2014-09-16 05:26:24 -05:00
Stephen Rothwell	b262b35c2c	pnfs/blocklayout: include vmalloc.h for __vmalloc Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2014-09-15 19:33:28 -04:00
Linus Torvalds	d6bb3e9075	vfs: simplify and shrink stack frame of link_path_walk() Commit `9226b5b440` ("vfs: avoid non-forwarding large load after small store in path lookup") made link_path_walk() always access the "hash_len" field as a single 64-bit entity, in order to avoid mixed size accesses to the members. However, what I didn't notice was that that effectively means that the whole "struct qstr this" is now basically redundant. We already explicitly track the "const char *name", and if we just use "u64 hash_len" instead of "long len", there is nothing else left of the "struct qstr". We do end up wanting the "struct qstr" if we have a filesystem with a "d_hash()" function, but that's a rare case, and we might as well then just squirrell away the name and hash_len at that point. End result: fewer live variables in the loop, a smaller stack frame, and better code generation. And we don't need to pass in pointers variables to helper functions any more, because the return value contains all the relevant information. So this removes more lines than it adds, and the source code is clearer too. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-15 10:51:07 -07:00
Steve French	da80659d4a	[SMB3] Fix oops when creating symlinks on smb3 We were not checking for symlink support properly for SMB2/SMB3 mounts so could oops when mounted with mfsymlinks when try to create symlink when mfsymlinks on smb2/smb3 mounts Signed-off-by: Steve French <smfrench@gmail.com> Cc: <stable@vger.kernel.org> # 3.14+ CC: Sachin Prabhu <sprabhu@redhat.com>	2014-09-15 03:04:50 -05:00
Linus Torvalds	83373f7028	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "double iput() on failure exit in lustre, racy removal of spliced dentries from ->s_anon in __d_materialise_dentry() plus a bunch of assorted RCU pathwalk fixes" The RCU pathwalk fixes end up fixing a couple of cases where we incorrectly dropped out of RCU walking, due to incorrect initialization and testing of the sequence locks in some corner cases. Since dropping out of RCU walk mode forces the slow locked accesses, those corner cases slowed down quite dramatically. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: be careful with nd->inode in path_init() and follow_dotdot_rcu() don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu() fix bogus read_seqretry() checks introduced in `b37199e` move the call of __d_drop(anon) into __d_materialise_unique(dentry, anon) [fix] lustre: d_make_root() does iput() on dentry allocation failure	2014-09-14 17:37:36 -07:00
Linus Torvalds	9226b5b440	vfs: avoid non-forwarding large load after small store in path lookup The performance regression that Josef Bacik reported in the pathname lookup (see commit `99d263d4c5` "vfs: fix bad hashing of dentries") made me look at performance stability of the dcache code, just to verify that the problem was actually fixed. That turned up a few other problems in this area. There are a few cases where we exit RCU lookup mode and go to the slow serializing case when we shouldn't, Al has fixed those and they'll come in with the next VFS pull. But my performance verification also shows that link_path_walk() turns out to have a very unfortunate 32-bit store of the length and hash of the name we look up, followed by a 64-bit read of the combined hash_len field. That screws up the processor store to load forwarding, causing an unnecessary hickup in this critical routine. It's caused by the ugly calling convention for the "hash_name()" function, and easily fixed by just making hash_name() fill in the whole 'struct qstr' rather than passing it a pointer to just the hash value. With that, the profile for this function looks much smoother. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-09-14 17:28:32 -07:00
Steve French	2ae83bf938	[CIFS] Fix setting time before epoch (negative time values) xfstest generic/258 sets the time on a file to a negative value (before 1970) which fails since do_div can not handle negative numbers. In addition 'normal' division of 64 bit values does not build on 32 bit arch so have to workaround this by special casing negative values in cifs_NTtimeToUnix Samba server also has a bug with this (see samba bugzilla 7771) but it works to Windows server. Signed-off-by: Steve French <smfrench@gmail.com>	2014-09-14 17:06:36 -05:00
Al Viro	4023bfc9f3	be careful with nd->inode in path_init() and follow_dotdot_rcu() in the former we simply check if dentry is still valid after picking its ->d_inode; in the latter we fetch ->d_inode in the same places where we fetch dentry and its ->d_seq, under the same checks. Cc: stable@vger.kernel.org # 2.6.38+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-14 14:24:47 -04:00
Al Viro	7bd88377d4	don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu() return the value instead, and have path_init() do the assignment. Broken by "vfs: Fix absolute RCU path walk failures due to uninitialized seq number", which was Cc-stable with 2.6.38+ as destination. This one should go where it went. To avoid dummy value returned in case when root is already set (it would do no harm, actually, since the only caller that doesn't ignore the return value is guaranteed to have nd->root not set, but it's more obvious that way), lift the check into callers. And do the same to set_root(), to keep them in sync. Cc: stable@vger.kernel.org # 2.6.38+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-14 14:19:44 -04:00
Al Viro	f5be3e2912	fix bogus read_seqretry() checks introduced in `b37199e` read_seqretry() returns true on mismatch, not on match... Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-09-13 22:14:16 -04:00

1 2 3 4 5 ...

37850 Commits