linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-26 14:12:06 +00:00

Author	SHA1	Message	Date
Adam Manzanares	d9a08a9e61	fs: Add aio iopriority support This is the per-I/O equivalent of the ioprio_set system call. When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field. This patch depends on block: add ioprio_check_cap function. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-31 10:50:55 -04:00
Adam Manzanares	fc28724d67	fs: Convert kiocb rw_hint from enum to u16 In order to avoid kiocb bloat for per command iopriority support, rw_hint is converted from enum to a u16. Added a guard around ki_hint assignment. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-31 10:50:54 -04:00
Adam Manzanares	aa43457799	block: add ioprio_check_cap function Aio per command iopriority support introduces a second interface between userland and the kernel capable of passing iopriority. The aio interface also needs the ability to verify that the submitting context has sufficient privileges to submit IOPRIO_RT commands. This patch creates the ioprio_check_cap function to be used by the ioprio_set system call and also by the aio interface. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-31 10:50:54 -04:00
Al Viro	1da92779e2	aio: sanitize the limit checking in io_submit(2) as it is, the logics in native io_submit(2) is "if asked for more than LONG_MAX/sizeof(pointer) iocbs to submit, don't bother with more than LONG_MAX/sizeof(pointer)" (i.e. 512M requests on 32bit and 1E requests on 64bit) while compat io_submit(2) goes with "stop after the first PAGE_SIZE/sizeof(pointer) iocbs", i.e. 1K or so. Which is * inconsistent * way too much in native case * possibly too little in compat one and * wrong anyway, since the natural point where we ought to stop bothering is ctx->nr_events Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-29 23:20:17 -04:00
Al Viro	67ba049f94	aio: fold do_io_submit() into callers get rid of insane "copy array of 32bit pointers into an array of native ones" glue. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-29 23:19:29 -04:00
Al Viro	95af8496ac	aio: shift copyin of iocb into io_submit_one() Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-29 23:18:31 -04:00
Al Viro	d2988bd412	aio_read_events_ring(): make a bit more readable The logics for 'avail' is * not past the tail of cyclic buffer * no more than asked * not past the end of buffer * not past the end of a page Unobfuscate the last part. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-29 23:18:17 -04:00
Al Viro	9061d14a8a	aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way ... so just make them return 0 when caller does not need to destroy iocb Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-29 23:17:40 -04:00
Al Viro	3c96c7f4ca	aio: take list removal to (some) callers of aio_complete() We really want iocb out of io_cancel(2) reach before we start tearing it down. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-29 23:16:43 -04:00
Christoph Hellwig	ac060cbaa8	aio: add missing break for the IOCB_CMD_FDSYNC case Looks like this got lost in a merge. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-28 13:40:50 -04:00
Christoph Hellwig	89b310a2b2	random: convert to ->poll_mask The big change is that random_read_wait and random_write_wait are merged into a single waitqueue that uses keyed wakeups. Because wait_event_* doesn't know about that this will lead to occassional spurious wakeups in _random_read and add_hwgenerator_randomness, but wait_event_* is designed to handle these and were are not in a a hot path there. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	652fe8e876	timerfd: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	9e42f195f5	eventfd: switch to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	dd67081b36	pipe: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	b28fc82267	crypto: af_alg: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	5001c2dcdf	net/rxrpc: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	f87be89481	net/iucv: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	e7a98d47ee	net/phonet: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	4bac2bcd83	net/nfc: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	9490e40a06	net/caif: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	17112d8081	net/bluetooth: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	568ea88ef9	net/sctp: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	4df7338f6f	net/tipc: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	31f50b5573	net/vmw_vsock: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	9f728af35f	net/atm: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	f4335f52bb	net/dccp: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	db5051ead6	net: convert datagram_poll users tp ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	e76cd24d02	net/unix: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	2c7d3daceb	net/tcp: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	984652dd8b	net: remove sock_no_poll Now that sock_poll handles a NULL ->poll or ->poll_mask there is no need for a stub. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	1525242310	net: add support for ->poll_mask in proto_ops The socket file operations still implement ->poll until all protocols are switched over. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	3cafb37633	net: refactor socket_poll Factor out two busy poll related helpers for late reuse, and remove a command that isn't very helpful, especially with the __poll_t annotations in place. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	1962da0d21	aio: try to complete poll iocbs without context switch If we can acquire ctx_lock without spinning we can just remove our iocb from the active_reqs list, and thus complete the iocbs from the wakeup context. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	2c14fa838c	aio: implement IOCB_CMD_POLL Simple one-shot poll through the io_submit() interface. To poll for a file descriptor the application should submit an iocb of type IOCB_CMD_POLL. It will poll the fd for the events specified in the the first 32 bits of the aio_buf field of the iocb. Unlike poll or epoll without EPOLLONESHOT this interface always works in one shot mode, that is once the iocb is completed, it will have to be resubmitted. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	888933f8fd	aio: simplify cancellation With the current aio code there is no need for the magic KIOCB_CANCELLED value, as a cancelation just kicks the driver to queue the completion ASAP, with all actual completion handling done in another thread. Given that both the completion path and cancelation take the context lock there is no need for magic cmpxchg loops either. If we remove iocbs from the active list after calling ->ki_cancel (but with ctx_lock still held), we can also rely on the invariant thay anything found on the list has a ->ki_cancel callback and can be cancelled, further simplifing the code. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	f3a2752a43	aio: simplify KIOCB_KEY handling No need to pass the key field to lookup_iocb to compare it with KIOCB_KEY, as we can do that right after retrieving it from userspace. Also move the KIOCB_KEY definition to aio.c as it is an internal value not used by any other place in the kernel. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	3deb642f0d	fs: introduce new ->get_poll_head and ->poll_mask methods ->get_poll_head returns the waitqueue that the poll operation is going to sleep on. Note that this means we can only use a single waitqueue for the poll, unlike some current drivers that use two waitqueues for different events. But now that we have keyed wakeups and heavily use those for poll there aren't that many good reason left to keep the multiple waitqueues, and if there are any ->poll is still around, the driver just won't support aio poll. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	9965ed174e	fs: add new vfs_poll and file_can_poll helpers These abstract out calls to the poll method in preparation for changes in how we poll. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	6e8b704df5	fs: update documentation to mention __poll_t and match the code Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	a0f8dcfc60	fs: cleanup do_pollfd Use straightline code with failure handling gotos instead of a lot of nested conditionals. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	8f546ae1fc	fs: unexport poll_schedule_timeout No users outside of select.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	ee219b946e	uapi: turn __poll_t sparse checks on by default Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-05-26 09:16:44 +02:00
Christoph Hellwig	ed0d523adb	Merge branch 'fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into aio-base	2018-05-26 09:16:25 +02:00
Al Viro	4faa99965e	fix io_destroy()/aio_complete() race If io_destroy() gets to cancelling everything that can be cancelled and gets to kiocb_cancel() calling the function driver has left in ->ki_cancel, it becomes vulnerable to a race with IO completion. At that point req is already taken off the list and aio_complete() does NOT spin until we (in free_ioctx_users()) releases ->ctx_lock. As the result, it proceeds to kiocb_free(), freing req just it gets passed to ->ki_cancel(). Fix is simple - remove from the list after the call of kiocb_cancel(). All instances of ->ki_cancel() already have to cope with the being called with iocb still on list - that's what happens in io_cancel(2). Cc: stable@kernel.org Fixes: `0460fef2a9` "aio: use cancellation list lazily" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-23 22:53:22 -04:00
Al Viro	baf10564fb	aio: fix io_destroy(2) vs. lookup_ioctx() race kill_ioctx() used to have an explicit RCU delay between removing the reference from ->ioctx_table and percpu_ref_kill() dropping the refcount. At some point that delay had been removed, on the theory that percpu_ref_kill() itself contained an RCU delay. Unfortunately, that was the wrong kind of RCU delay and it didn't care about rcu_read_lock() used by lookup_ioctx(). As the result, we could get ctx freed right under lookup_ioctx(). Tejun has fixed that in `a6d7cff472` ("fs/aio: Add explicit RCU grace period when freeing kioctx"); however, that fix is not enough. Suppose io_destroy() from one thread races with e.g. io_setup() from another; CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2 has picked it (under rcu_read_lock()). Then CPU1 proceeds to drop the refcount, getting it to 0 and triggering a call of free_ioctx_users(), which proceeds to drop the secondary refcount and once that reaches zero calls free_ioctx_reqs(). That does INIT_RCU_WORK(&ctx->free_rwork, free_ioctx); queue_rcu_work(system_wq, &ctx->free_rwork); and schedules freeing the whole thing after RCU delay. In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the refcount from 0 to 1 and returned the reference to io_setup(). Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get freed until after percpu_ref_get(). Sure, we'd increment the counter before ctx can be freed. Now we are out of rcu_read_lock() and there's nothing to stop freeing of the whole thing. Unfortunately, CPU2 assumes that since it has grabbed the reference, ctx is NOT going away until it gets around to dropping that reference. The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss. It's not costlier than what we currently do in normal case, it's safe to call since freeing is delayed and it closes the race window - either lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx() fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see the object in question at all. Cc: stable@kernel.org Fixes: `a6d7cff472` "fs/aio: Add explicit RCU grace period when freeing kioctx" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-21 14:30:11 -04:00
Al Viro	5aa1437d2d	ext2: fix a block leak open file, unlink it, then use ioctl(2) to make it immutable or append only. Now close it and watch the blocks not freed... Immutable/append-only checks belong in ->setattr(). Note: the bug is old and backport to anything prior to `737f2e93b9` ("ext2: convert to use the new truncate convention") will need these checks lifted into ext2_setattr(). Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-21 14:30:11 -04:00
Al Viro	3819bb0d79	nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed That can (and does, on some filesystems) happen - ->mkdir() (and thus vfs_mkdir()) can legitimately leave its argument negative and just unhash it, counting upon the lookup to pick the object we'd created next time we try to look at that name. Some vfs_mkdir() callers forget about that possibility... Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-21 14:30:10 -04:00
Al Viro	9c3e9025a3	cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashed That can (and does, on some filesystems) happen - ->mkdir() (and thus vfs_mkdir()) can legitimately leave its argument negative and just unhash it, counting upon the lookup to pick the object we'd created next time we try to look at that name. Some vfs_mkdir() callers forget about that possibility... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-21 14:30:10 -04:00
Al Viro	7b745a4e40	unfuck sysfs_mount() new_sb is left uninitialized in case of early failures in kernfs_mount_ns(), and while IS_ERR(root) is true in all such cases, using IS_ERR(root) \|\| !new_sb is not a solution - IS_ERR(root) is true in some cases when new_sb is true. Make sure new_sb is initialized (and matches the reality) in all cases and fix the condition for dropping kobj reference - we want it done precisely in those situations where the reference has not been transferred into a new super_block instance. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-21 14:30:09 -04:00
Al Viro	82382acec0	kernfs: deal with kernfs_fill_super() failures make sure that info->node is initialized early, so that kernfs_kill_sb() can list_del() it safely. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-05-21 14:30:08 -04:00

1 2 3 4 5 ...

752815 Commits