Change all the callers of ->readpage to call ->read_folio in preference,
if it exists. This is a transitional duplication, and will be removed
by the end of the series.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Turns out that ever since this mount option was added, passing
`softreval` in NFS mount options cancelled all other flags while not
affecting the underlying flag `NFS_MOUNT_SOFTREVAL`.
Fixes: c74dfe97c1 ("NFS: Add mount option 'softreval'")
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
EAGAIN doesn't guarantee to have a free section. Let's report it.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If the number of unusable blocks is not larger than
unusable capacity, we can skip GC when checkpoint
disabling.
Signed-off-by: Weichao Guo <guoweichao@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: Fix missing gc_mode assignment]
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This adds the overflow processing for large CQE's.
This adds two parameters to the io_cqring_event_overflow function and
uses these fields to initialize the large CQE fields.
Allocate enough space for large CQE's in the overflow structue. If no
large CQE's are used, the size of the allocation is unchanged.
The cqe field can have a different size depending if its a large
CQE or not. To be able to allocate different sizes, the two fields
in the structure are re-ordered.
Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220426182134.136504-9-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Normal SQEs are 64-bytes in length, which is fine for all the commands
we support. However, in preparation for supporting passthrough IO,
provide an option for setting up a ring with 128-byte SQEs.
We continue to use the same type for io_uring_sqe, it's marked and
commented with a zero sized array pad at the end. This provides up
to 80 bytes of data for a passthrough command - 64 bytes for the
extra added data, and 16 bytes available at the end of the existing
SQE.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* for-5.19/io_uring-socket:
io_uring: use the text representation of ops in trace
io_uring: rename op -> opcode
io_uring: add io_uring_get_opcode
io_uring: add type to op enum
io_uring: add socket(2) support
net: add __sys_socket_file()
io_uring: fix trace for reduced sqe padding
io_uring: add fgetxattr and getxattr support
io_uring: add fsetxattr and setxattr support
fs: split off do_getxattr from getxattr
fs: split off setxattr_copy and do_setxattr function from setxattr
* for-5.19/io_uring: (85 commits)
io_uring: don't clear req->kbuf when buffer selection is done
io_uring: eliminate the need to track provided buffer ID separately
io_uring: move provided buffer state closer to submit state
io_uring: move provided and fixed buffers into the same io_kiocb area
io_uring: abstract out provided buffer list selection
io_uring: never call io_buffer_select() for a buffer re-select
io_uring: get rid of hashed provided buffer groups
io_uring: always use req->buf_index for the provided buffer group
io_uring: ignore ->buf_index if REQ_F_BUFFER_SELECT isn't set
io_uring: kill io_rw_buffer_select() wrapper
io_uring: make io_buffer_select() return the user address directly
io_uring: kill io_recv_buffer_select() wrapper
io_uring: use 'sr' vs 'req->sr_msg' consistently
io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg
io_uring: check IOPOLL/ioprio support upfront
io_uring: replace smp_mb() with smp_mb__after_atomic() in io_sq_thread()
io_uring: add IORING_SETUP_TASKRUN_FLAG
io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used
io_uring: set task_work notify method at init time
io-wq: use __set_notify_signal() to wake workers
...
It's not needed as the REQ_F_BUFFER_SELECTED flag tracks the state of
whether or not kbuf is valid, so just drop it.
Suggested-by: Dylan Yudaken <dylany@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We have io_kiocb->buf_index which is used for either fixed buffers, or
for provided buffers. For the latter, it's used to hold the buffer group
ID for buffer selection. Post selection, req->kbuf->bid is used to get
the buffer ID.
Store the buffer ID, when selected, in req->buf_index. If we do end up
recycling the buffer, reset it back to the buffer group ID.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These are mutually exclusive - if you use provided buffers, then you
cannot use fixed buffers and vice versa. Move them into the same spot
in the io_kiocb, which is also advantageous for provided buffers as
they get near the submit side hot cacheline.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Callers already have room to store the addr and length information,
clean it up by having the caller just assign the previously provided
data.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use a plain array for any group ID that's less than 64, and punt
anything beyond that to an xarray. 64 fits in a page even for 4KB
page sizes and with the planned additions.
This makes the expected group usage faster by avoiding a hash and lookup
to find our list, and it uses less memory upfront by not allocating any
memory for provided buffers unless it's actually being used.
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The read/write opcodes use it already, but the recv/recvmsg do not. If
we switch them over and read and validate this at init time while we're
checking if the opcode supports it anyway, then we can do it in one spot
and we don't have to pass in a separate group ID for io_buffer_select().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There's no point in validity checking buf_index if the request doesn't
have REQ_F_BUFFER_SELECT set, as we will never use it for that case.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After the recent changes, this is direct call to io_buffer_select()
anyway. With this change, there are no wrappers left for provided
buffer selection.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dirent events (create/delete/move) are only reported on watched
directory inodes, but in fanotify as well as in legacy inotify, it was
always allowed to set them on non-dir inode, which does not result in
any meaningful outcome.
Until kernel v5.17, dirent events in fanotify also differed from events
"on child" (e.g. FAN_OPEN) in the information provided in the event.
For example, FAN_OPEN could be set in the mask of a non-dir or the mask
of its parent and event would report the fid of the child regardless of
the marked object.
By contrast, FAN_DELETE is not reported if the child is marked and the
child fid was not reported in the events.
Since kernel v5.17, with fanotify group flag FAN_REPORT_TARGET_FID, the
fid of the child is reported with dirent events, like events "on child",
which may create confusion for users expecting the same behavior as
events "on child" when setting events in the mask on a child.
The desired semantics of setting dirent events in the mask of a child
are not clear, so for now, deny this action for a group initialized
with flag FAN_REPORT_TARGET_FID and for the new event FAN_RENAME.
We may relax this restriction in the future if we decide on the
semantics and implement them.
Fixes: d61fd650e9 ("fanotify: introduce group flag FAN_REPORT_TARGET_FID")
Fixes: 8cc3b1ccd9 ("fanotify: wire up FAN_RENAME event")
Link: https://lore.kernel.org/linux-fsdevel/20220505133057.zm5t6vumc4xdcnsg@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220507080028.219826-1-amir73il@gmail.com
This is a clean up patch that skips the flip flag logic for delayed attr
renames. Since the log replay keeps the inode locked, we do not need to
worry about race windows with attr lookups. So we can skip over
flipping the flag and the extra transaction roll for it
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
In iomap_write_end(), only call iomap_write_failed() on the byte range
that has failed. This should improve code readability, but doesn't fix
an actual bug because iomap_write_failed() is called after updating the
file size here and it only affects the memory beyond the end of the
file.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The @lend parameter of truncate_pagecache_range() should be the offset
of the last byte of the hole, not the first byte beyond it.
Fixes: ae259a9c85 ("fs: introduce iomap infrastructure")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
- Calculate iblock directly instead of using a while loop
- Move has_buffers to the end to remove a backwards jump
- Use __filemap_get_folio() instead of grab_cache_page(), which
removes a spurious FGP_ACCESSED flag.
- Eliminate length and pos variables
- Use folio APIs where they exist
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Removes a couple of calls to compound_head and saves a few bytes.
Also convert verity's read_file_data_page() to be folio-based.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Pass a folio instead of a page to aops->is_dirty_writeback().
Convert both implementations and the caller.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
pagecache_write_begin() and pagecache_write_end() are now trivial
wrappers, so call the aops directly.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
pagecache_write_begin() and pagecache_write_end() are now trivial
wrappers, so call the aops directly.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
There is only one kind of write_begin/write_end aops, so we don't need
to look up which aop it is, just make hfsplus_write_begin() available to
this file and call it directly.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
There is only one kind of write_begin/write_end aops, so we don't need
to look up which aop it is, just make hfs_write_begin() available to
this file and call it directly.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
After the last patch, Smatch reports:
fs/ntfs3/file.c:168 ntfs_extend_initialized_size()
error: uninitialized symbol 'fsdata'.
fsdata is indeed unused. This is not new, but Smatch couldn't see it
before because calls through pagecache_write_begin()/pagecache_write_end()
could theoretically call any implemention of ->write_begin/write_end,
some of which do use fsdata. Now that the calls are direct, Smatch can
see they're never used.
Fix this by simply passing NULL. While ntfs3 does pass this parameter
on to generic functions, those generic functions also never dereference
the fsdata parameter, so it's unnecessary to pass the address of a real
pointer.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
There is only one kind of write_begin/write_end aops, so we don't need to
look up which aop it is, just make ntfs_write_begin() and ntfs_write_end()
available to this file and call them directly.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
pagecache_write_begin() and pagecache_write_end() are now trivial
wrappers, so call the aops directly.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
pagecache_write_begin() and pagecache_write_end() are now trivial
wrappers, so call the aops directly.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
There are no more aop flags left, so remove the parameter.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
There are no more aop flags left, so remove the parameter.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
There are no more aop flags left, so remove the parameter.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
There are no more aop flags left, so remove the parameter.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>