linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-24 13:11:40 +00:00

Author	SHA1	Message	Date
Trond Myklebust	ea9afca88b	SUNRPC: Replace use of socket sk_callback_lock with sock_lock Since we do things like setting flags, etc it really is more appropriate to use sock_lock(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-11-01 11:00:30 -04:00
Trond Myklebust	01d29f87fc	NFSv4: Fix a regression in nfs_set_open_stateid_locked() If we already hold open state on the client, yet the server gives us a completely different stateid to the one we already hold, then we currently treat it as if it were an out-of-sequence update, and wait for 5 seconds for other updates to come in. This commit fixes the behaviour so that we immediately start processing of the new stateid, and then leave it to the call to nfs4_test_and_free_stateid() to decide what to do with the old stateid. Fixes: `b4868b44c5` ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-27 15:24:46 -04:00
Trond Myklebust	4cd27df88a	NFS: Remove redundant call to __set_page_dirty_nobuffers Remove a redundant call in nfs_updatepage(). nfs_writepage_setup() will have already called nfs_mark_request_dirty() on success. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-21 17:11:41 -04:00
Thiago Rafael Becker	023859ce6f	sunrpc: remove unnecessary test in rpc_task_set_client() In rpc_task_set_client(), testing for a NULL clnt is not necessary, as clnt should always be a valid pointer to a rpc_client. Signed-off-by: Thiago Rafael Becker <trbecker@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:55 -04:00
Anna Schumaker	5fe1210d25	NFS: Unexport nfs_probe_fsinfo() All the callers are now in client.c so we can remove the EXPORT_SYMBOL_GPL() and make it static. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:55 -04:00
Anna Schumaker	1301ba603c	NFS: Call nfs_probe_server() during a fscontext-reconfigure event This lets us update the server's attributes when the user does a "mount -o remount" on the filesystem. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:55 -04:00
Anna Schumaker	4d4cf8d2d6	NFS: Replace calls to nfs_probe_fsinfo() with nfs_probe_server() Clean up. There are a few places where we want to probe the server, but don't actually care about the fsinfo result. Change these to use nfs_probe_server(), which handles the fattr allocation for us. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Anna Schumaker	e5731131fb	NFS: Move nfs_probe_destination() into the generic client And rename it to nfs_probe_server(). I also change it to take the nfs_fh as an argument so callers can choose what filehandle to probe. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Anna Schumaker	01dde76e47	NFS: Create an nfs4_server_set_init_caps() function And call it before doing an FSINFO probe to reset to the baseline capabilities before probing. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Chuck Lever	86882c7546	NFS: Remove --> and <-- dprintk call sites dprintk call sites that display no other information than the function name can be replaced with use of the trace "function" or "function_graph" plug-ins. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Chuck Lever	b40887e10d	SUNRPC: Trace calls to .rpc_call_done Introduce a single tracepoint that can replace simple dprintk call sites in upper layer "rpc_call_done" callbacks. Example: kworker/u24:2-1254 [001] 771.026677: rpc_stats_latency: task:00000001@00000002 xid=0x16a6f3c0 rpcbindv2 GETPORT backlog=446 rtt=101 execute=555 kworker/u24:2-1254 [001] 771.026677: rpc_task_call_done: task:00000001@00000002 flags=ASYNC\|DYNAMIC\|SOFT\|SOFTCONN\|SENT runstate=RUNNING\|ACTIVE status=0 action=rpcb_getport_done kworker/u24:2-1254 [001] 771.026678: rpcb_setport: task:00000001@00000002 status=0 port=20048 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Chuck Lever	d9f877433e	NFS: Replace dprintk callsites in nfs_readpage(s) These new events report slightly different information for readpage and readpages/readahead. For readpage: fsx-1387 [006] 380.761896: nfs_aop_readpage: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355910932437 offset=131072 fsx-1387 [006] 380.761900: nfs_aop_readpage_done: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355910932437 offset=131072 ret=0 The index of a synchronous single-page read is reported. For readpages: fsx-1387 [006] 380.760847: nfs_aop_readahead: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 nr_pages=3 fsx-1387 [006] 380.760853: nfs_aop_readahead_done: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 nr_pages=3 ret=0 The count of pages requested is reported. nfs_readpages does not wait for the READ requests to complete. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Chuck Lever	76497b1adb	SUNRPC: Use BIT() macro in rpc_show_xprt_state() Clean up: BIT() is preferred over open-coding the shift. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Chuck Lever	b4776a341e	SUNRPC: Tracepoints should display tk_pid and cl_clid as a fixed-size field For certain special cases, RPC-related tracepoints record a -1 as the task ID or the client ID. It's ugly for a trace event to display 4 billion in these cases. To help keep SUNRPC tracepoints consistent, create a macro that defines the print format specifiers for tk_pid and cl_clid. At some point in the future we might try tk_pid with a wider range of values than 0..64K so this makes it easier to make that change. RPC tracepoints now look like this: <...>-1276 [009] 149.720358: rpc_clnt_new: client=00000005 peer=[192.168.2.55]:20049 program=nfs server=klimt.ib <...>-1342 [004] 149.921234: rpc_xdr_recvfrom: task:0000001a@00000005 head=[0xff1242d9ab6dc01c,144] page=0 tail=[(nil),0] len=144 <...>-1342 [004] 149.921235: xprt_release_cong: task:0000001a@00000005 snd_task:ffffffff cong=256 cwnd=16384 <...>-1342 [004] 149.921235: xprt_put_cong: task:0000001a@00000005 snd_task:ffffffff cong=0 cwnd=16384 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Chuck Lever	7a3d524c4c	xprtrdma: Remove rpcrdma_ep::re_implicit_roundup Clean up: this field is no longer used. xprt_rdma_pad_optimize is also no longer used, but is left in place because it is part of the kernel/userspace API. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Chuck Lever	21037b8c22	xprtrdma: Provide a buffer to pad Write chunks of unaligned length This is a buffer to be left persistently registered while a connection is up. Connection tear-down will automatically DMA-unmap, invalidate, and dereg the MR. A persistently registered buffer is lower in cost to provide, and it can never be coalesced into the RDMA segment that carries the data payload. An RPC that provisions a Write chunk with a non-aligned length now uses this MR rather than the tail buffer of the RPC's rq_rcv_buf. Reviewed-By: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Alexey Gladkov	d5f458a979	Fix user namespace leak Fixes: `61ca2c4afd` ("NFS: Only reference user namespace from nfs4idmap struct instead of cred") Signed-off-by: Alexey Gladkov <legion@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Trond Myklebust	e591b298d7	NFS: Save some space in the inode Save some space in the nfs_inode by setting up an anonymous union with the fields that are peculiar to a specific type of filesystem object. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Dave Wysochanski	0ebeebcf59	NFS: Fix WARN_ON due to unionization of nfs_inode.nrequests Fixes the following WARN_ON WARNING: CPU: 2 PID: 18678 at fs/nfs/inode.c:123 nfs_clear_inode+0x3b/0x50 [nfs] ... Call Trace: nfs4_evict_inode+0x57/0x70 [nfsv4] evict+0xd1/0x180 dispose_list+0x48/0x60 evict_inodes+0x156/0x190 generic_shutdown_super+0x37/0x110 nfs_kill_super+0x1d/0x40 [nfs] deactivate_locked_super+0x36/0xa0 Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:54 -04:00
Trond Myklebust	6e176d4716	NFSv4: Fixes for nfs4_inode_return_delegation() We mustn't call nfs_wb_all() on anything other than a regular file. Furthermore, we can exit early when we don't hold a delegation. Reported-by: David Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:53 -04:00
Trond Myklebust	f0caea8882	NFS: Fix an Oops in pnfs_mark_request_commit() Olga reports seeing the following Oops when doing O_DIRECT writes to a pNFS flexfiles server: Oops: 0000 [#1] SMP PTI CPU: 1 PID: 234186 Comm: kworker/u8:1 Not tainted 5.15.0-rc4+ #4 Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Workqueue: nfsiod rpc_async_release [sunrpc] RIP: 0010:nfs_mark_request_commit+0x12/0x30 [nfs] Code: ff ff be 03 00 00 00 e8 ac 34 83 eb e9 29 ff ff ff e8 22 bc d7 eb 66 90 0f 1f 44 00 00 48 85 f6 74 16 48 8b 42 10 48 8b 40 18 <48> 8b 40 18 48 85 c0 74 05 e9 70 fc 15 ec 48 89 d6 e9 68 ed ff ff RSP: 0018:ffffa82f0159fe00 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8f3393141880 RCX: 0000000000000000 RDX: ffffa82f0159fe08 RSI: ffff8f3381252500 RDI: ffff8f3393141880 RBP: ffff8f33ac317c00 R08: 0000000000000000 R09: ffff8f3487724cb0 R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000001 R13: ffff8f3485bccee0 R14: ffff8f33ac317c10 R15: ffff8f33ac317cd8 FS: 0000000000000000(0000) GS:ffff8f34fbc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 0000000122120006 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: nfs_direct_write_completion+0x13b/0x250 [nfs] rpc_free_task+0x39/0x60 [sunrpc] rpc_async_release+0x29/0x40 [sunrpc] process_one_work+0x1ce/0x370 worker_thread+0x30/0x380 ? process_one_work+0x370/0x370 kthread+0x11a/0x140 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: `9c455a8c1e` ("NFS/pNFS: Clean up pNFS commit operations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:53 -04:00
Trond Myklebust	133a48abf6	NFS: Fix up commit deadlocks If O_DIRECT bumps the commit_info rpcs_out field, then that could lead to fsync() hangs. The fix is to ensure that O_DIRECT calls nfs_commit_end(). Fixes: `723c921e7d` ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-20 18:09:45 -04:00
Trond Myklebust	64a93dbf25	NFS: Fix deadlocks in nfs_scan_commit_list() Partially revert commit `2ce209c42c` ("NFS: Wait for requests that are locked on the commit list"), since it can lead to deadlocks between commit requests and nfs_join_page_group(). For now we should assume that any locked requests on the commit list are either about to be removed and committed by another task, or the writes they describe are about to be retransmitted. In either case, we should not need to worry. Fixes: `2ce209c42c` ("NFS: Wait for requests that are locked on the commit list") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-10 11:05:54 +02:00
Chuck Lever	110cb2d2f9	NFS: Instrument i_size_write() Generate a trace event whenever the NFS client modifies the size of a file. These new events aid troubleshooting workloads that trigger races around size updates. There are four new trace points, all named nfs_size_something so they are easy to grep for or enable as a group with a single glob. Size updated on the server: kworker/u24:10-194 [010] 369.939174: nfs_size_update: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899344277980615 cursize=250471 newsize=172083 Server-side size update reported via NFSv3 WCC attributes: fsx-1387 [006] 380.760686: nfs_size_wcc: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 cursize=146792 newsize=171216 File has been truncated locally: fsx-1387 [007] 369.437421: nfs_size_truncate: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899231200117272 cursize=215244 newsize=0 File has been extended locally: fsx-1387 [007] 369.439213: nfs_size_grow: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899343704248410 cursize=258048 newsize=262144 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-10 11:05:54 +02:00
Chuck Lever	0392dd51f9	SUNRPC: Per-rpc_clnt task PIDs The current range of RPC task PIDs is 0..65535. That's not adequate for distinguishing tasks across multiple rpc_clnts running high throughput workloads. To help relieve this situation and to reduce the bottleneck of having a single atomic for assigning all RPC task PIDs, assign task PIDs per rpc_clnt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-10 11:05:54 +02:00
Chuck Lever	8e09650f5e	NFS: Remove unnecessary TRACE_DEFINE_ENUM()s Clean up: TRACE_DEFINE_ENUM is unnecessary because the target symbols are all C macros, not enums. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-10 11:05:54 +02:00
Baptiste Lepers	a2915fa062	pnfs/flexfiles: Fix misplaced barrier in nfs4_ff_layout_prepare_ds _nfs4_pnfs_v3/v4_ds_connect do some work smp_wmb ds->ds_clp = clp; And nfs4_ff_layout_prepare_ds currently does smp_rmb if(ds->ds_clp) ... This patch places the smp_rmb after the if. This ensures that following reads only happen once nfs4_ff_layout_prepare_ds has checked that data has been properly initialized. Fixes: `d67ae825a5` ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 23:00:50 -04:00
Trond Myklebust	36a10a3c4c	NFS: Remove unnecessary page cache invalidations Remove cache invalidations that are already covered by change attribute updates. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 20:49:07 -04:00
Trond Myklebust	b97583b263	NFS: Do not flush the readdir cache in nfs_dentry_iput() The original premise in commit `83672d392f` ("NFS: Fix directory caching problem - with test case and patch.") was that readdirplus was caching attribute information and replaying it later. This is no longer the case. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 20:49:07 -04:00
Trond Myklebust	cec08f452a	NFS: Fix dentry verifier races If the directory changed while we were revalidating the dentry, then don't update the dentry verifier. There is no value in setting the verifier to an older value, and we could end up overwriting a more up to date verifier from a parallel revalidation. Fixes: `efeda80da3` ("NFSv4: Fix revalidation of dentries with delegations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>	2021-10-03 20:49:07 -04:00
Trond Myklebust	ff81dfb5d7	NFS: Further optimisations for 'ls -l' If a user is doing 'ls -l', we have a heuristic in GETATTR that tells the readdir code to try to use READDIRPLUS in order to refresh the inode attributes. In certain cirumstances, we also try to invalidate the remaining directory entries in order to ensure this refresh. If there are multiple readers of the directory, we probably should avoid invalidating the page cache, since the heuristic breaks down in that situation anyway. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>	2021-10-03 20:49:07 -04:00
Trond Myklebust	2929bc3329	NFS: Fix up nfs_readdir_inode_mapping_valid() The check for duplicate readdir cookies should only care if the change attribute is invalid or the data cache is invalid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>	2021-10-03 20:49:07 -04:00
Trond Myklebust	a6a361c4ca	NFS: Ignore the directory size when marking for revalidation If we want to revalidate the directory, then just mark the change attribute as invalid. Fixes: `13c0b082b6` ("NFS: Replace use of NFS_INO_REVAL_PAGECACHE when checking cache validity") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>	2021-10-03 20:49:06 -04:00
Trond Myklebust	488796ec1e	NFS: Don't set NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA should be considered mutually exclusive. Fixes: `1c341b7775` ("NFS: Add deferred cache invalidation for close-to-open consistency violations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>	2021-10-03 20:49:06 -04:00
Trond Myklebust	eea413308f	NFS: Default change_attr_type to NFS4_CHANGE_TYPE_IS_UNDEFINED Both NFSv3 and NFSv2 generate their change attribute from the ctime value that was supplied by the server. However the problem is that there are plenty of servers out there with ctime resolutions of 1ms or worse. In a modern performance system, this is insufficient when trying to decide which is the most recent set of attributes when, for instance, a READ or GETATTR call races with a WRITE or SETATTR. For this reason, let's revert to labelling the NFSv2/v3 change attributes as NFS4_CHANGE_TYPE_IS_UNDEFINED. This will ensure we protect against such races. Fixes: `7b24dacf08` ("NFS: Another inode revalidation improvement") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>	2021-10-03 20:49:06 -04:00
Trond Myklebust	a1e7f30a86	NFSv4: Retrieve ACCESS on open if we're not using NFS4_CREATE_EXCLUSIVE NFS4_CREATE_EXCLUSIVE does not allow the caller to set an access mode, so for most Linux filesystems, the access call ends up returning no permissions. However both NFS4_CREATE_EXCLUSIVE4_1 and NFS4_CREATE_GUARDED allow the client to set the access mode. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 20:49:06 -04:00
Trond Myklebust	43d20e80e2	NFS: Fix a few more clear_bit() instances that need release semantics All these bits are being used as bit locks. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 20:49:06 -04:00
Trond Myklebust	33c3214bf4	SUNRPC: xprt_clear_locked() only needs release memory semantics The clearing of the XPRT_LOCKED bit has to happen after we clear xprt->snd_task, but we don't require any extra memory barriers after that. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 20:49:06 -04:00
Trond Myklebust	b9f8713f42	SUNRPC: Remove unnecessary memory barriers The only check for RPC_TASK_RUNNING is the one in rpc_make_runnable(), which happens under the same spin lock held when we call rpc_clear_running(). Ditto, the last check for RPC_TASK_QUEUED in rpc_execute() is performed under the same lock as the one held when we call rpc_clear_queued(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 20:49:05 -04:00
Trond Myklebust	6dbcbe3f78	SUNRPC: Remove WQ_HIGHPRI from xprtiod Don't let xprtiod pre-empt softirq. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 20:49:05 -04:00
Trond Myklebust	47dd8796a3	SUNRPC: Add cond_resched() at the appropriate point in __rpc_execute() Allow tasks that need to pre-empt rpciod/xprtiod to do so when it is safe. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 20:49:05 -04:00
Trond Myklebust	ea7a1019d8	SUNRPC: Partial revert of commit `6f9f17287e` The premise of commit `6f9f17287e` ("SUNRPC: Mitigate cond_resched() in xprt_transmit()") was that cond_resched() is expensive and unnecessary when there has been just a single send. The point of cond_resched() is to ensure that tasks that should pre-empt this one get a chance to do so when it is safe to do so. The code prior to commit `6f9f17287e` failed to take into account that it was keeping a rpc_task pinned for longer than it needed to, and so rather than doing a full revert, let's just move the cond_resched. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 20:49:05 -04:00
Trond Myklebust	ca05cbae2a	NFS: Fix up nfs_ctx_key_to_expire() If the cached credential exists but doesn't have any expiration callback then exit early. Fix up atomicity issues when replacing the credential with a new one since the existing code could lead to refcount leaks. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 20:49:05 -04:00
Trond Myklebust	9019fb391d	NFS: Label the dentry with a verifier in nfs_rmdir() and nfs_unlink() After the success of an operation such as rmdir() or unlink(), we expect to add the dentry back to the dcache as an ordinary negative dentry. However in NFS, unless it is labelled with the appropriate verifier for the parent directory state, then nfs_lookup_revalidate will end up discarding that dentry and forcing a new lookup. The fix is to ensure that we relabel the dentry appropriately on success. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 20:49:05 -04:00
Trond Myklebust	342a67f088	NFS: Label the dentry with a verifier in nfs_link(), nfs_symlink() After the success of an operation such as link(), or symlink(), we expect to add the dentry back to the dcache as an ordinary positive dentry. However in NFS, unless it is labelled with the appropriate verifier for the parent directory state, then nfs_lookup_revalidate will end up discarding that dentry and forcing a new lookup. The fix is to ensure that we relabel the dentry appropriately on success. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-10-03 20:49:05 -04:00
Linus Torvalds	9e1ff307c7	Linux 5.15-rc4	2021-10-03 14:08:47 -07:00
Chen Jingwen	9b2f72cc0a	elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings In commit `b212921b13` ("elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings") we still leave MAP_FIXED_NOREPLACE in place for load_elf_interp. Unfortunately, this will cause kernel to fail to start with: 1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already Failed to execute /init (error -17) The reason is that the elf interpreter (ld.so) has overlapping segments. readelf -l ld-2.31.so Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x000000000002c94c 0x000000000002c94c R E 0x10000 LOAD 0x000000000002dae0 0x000000000003dae0 0x000000000003dae0 0x00000000000021e8 0x0000000000002320 RW 0x10000 LOAD 0x000000000002fe00 0x000000000003fe00 0x000000000003fe00 0x00000000000011ac 0x0000000000001328 RW 0x10000 The reason for this problem is the same as described in commit `ad55eac74f` ("elf: enforce MAP_FIXED on overlaying elf segments"). Not only executable binaries, elf interpreters (e.g. ld.so) can have overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go back to MAP_FIXED in load_elf_interp. Fixes: `4ed2863951` ("fs, elf: drop MAP_FIXED usage from elf_map") Cc: <stable@vger.kernel.org> # v4.19 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-10-03 14:02:58 -07:00
Linus Torvalds	ca3cef466f	Fix a number of ext4 bugs in fast_commit, inline data, and delayed allocation. Also fix error handling code paths in ext4_dx_readdir() and ext4_fill_super(). Finally, avoid a grabbing a journal head in the delayed allocation write in the common cases where we are overwriting an pre-existing block or appending to an inode. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmFZ2SsACgkQ8vlZVpUN gaN6DAgAkIeisL1EfQT0VwshEs8y7N6IoX8dydLSRLpNf5oWiJOv2CTY9Qpi6X/C qNfuLsbJ2NXChvhIAM2hD82hvX21rYc6iqPxgho02VF4eYIP7NzLjwTFKnKbHPB5 TiF498nJTnkcmSrJUEXmSAEdLoCwa5THH9+9HVHXZrkLXPULBtOOJ85mDAcIzVhV Zqb7yfbpWl0gnF0S0YjNATPtbhcC9EiC4MOVYVesRlgT9B3+k5q4fmVU0euTU9OH F2H6TNG+Mg/19gTnDP5acB9+eXHvYEqMpe+CaDifR9iFE9PTG/Edhxr6z9roXhHr kBvEVHSFH+YTEJXghnpS9YDd9Lwc9w== =WKzd -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix a number of ext4 bugs in fast_commit, inline data, and delayed allocation. Also fix error handling code paths in ext4_dx_readdir() and ext4_fill_super(). Finally, avoid a grabbing a journal head in the delayed allocation write in the common cases where we are overwriting a pre-existing block or appending to an inode" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: recheck buffer uptodate bit under buffer lock ext4: fix potential infinite loop in ext4_dx_readdir() ext4: flush s_error_work before journal destroy in ext4_fill_super ext4: fix loff_t overflow in ext4_max_bitmap_size() ext4: fix reserved space counter leakage ext4: limit the number of blocks in one ADD_RANGE TLV ext4: enforce buffer head state assertion in ext4_da_map_blocks ext4: remove extent cache entries when truncating inline data ext4: drop unnecessary journal handle in delalloc write ext4: factor out write end code of inline file ext4: correct the error path of ext4_write_inline_data_end() ext4: check and update i_disksize properly ext4: add error checking to ext4_ext_replay_set_iblocks()	2021-10-03 13:56:53 -07:00
Linus Torvalds	7fab1c12bd	objtool: print out the symbol type when complaining about it The objtool warning that the kvm instruction emulation code triggered wasn't very useful: arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception in that it helpfully tells you which symbol name it had trouble figuring out the relocation for, but it doesn't actually say what the unknown symbol type was that triggered it all. In this case it was because of missing type information (type 0, aka STT_NOTYPE), but on the whole it really should just have printed that out as part of the message. Because if this warning triggers, that's very much the first thing you want to know - why did reloc2sec_off() return failure for that symbol? So rather than just saying you can't handle some type of symbol without saying what the type _was_, just print out the type number too. Fixes: `24ff652573` ("objtool: Teach get_alt_entry() about more relocation types") Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-10-03 13:45:48 -07:00
Linus Torvalds	291073a566	kvm: fix objtool relocation warning The recent change to make objtool aware of more symbol relocation types (commit `24ff652573`: "objtool: Teach get_alt_entry() about more relocation types") also added another check, and resulted in this objtool warning when building kvm on x86: arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception The reason seems to be that kvm_fastop_exception() is marked as a global symbol, which causes the relocation to ke kept around for objtool. And at the same time, the kvm_fastop_exception definition (which is done as an inline asm statement) doesn't actually set the type of the global, which then makes objtool unhappy. The minimal fix is to just not mark kvm_fastop_exception as being a global symbol. It's only used in that one compilation unit anyway, so it was always pointless. That's how all the other local exception table labels are done. I'm not entirely happy about the kinds of games that the kvm code plays with doing its own exception handling, and the fact that it confused objtool is most definitely a symptom of the code being a bit too subtle and ad-hoc. But at least this trivial one-liner makes objtool no longer upset about what is going on. Fixes: `24ff652573` ("objtool: Teach get_alt_entry() about more relocation types") Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/ Cc: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-10-03 13:34:19 -07:00

1 2 3 4 5 ...

1043708 Commits