linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-24 21:21:41 +00:00

Author	SHA1	Message	Date
Linus Torvalds	241c7ed4d4	vfs-6.13.untorn.writes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcopwAKCRCRxhvAZXjc oitWAQD68PGFI6/ES9x+qGsDFEZBH08icuO+a9dyaZXyNRosDgD/ex2zHj6F7IzS Ghgb9jiqWQ8l2+PDYfisxa/0jiqCbAk= =DmXf -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs untorn write support from Christian Brauner: "An atomic write is a write issed with torn-write protection. This means for a power failure or any hardware failure all or none of the data from the write will be stored, never a mix of old and new data. This work is already supported for block devices. If a block device is opened with O_DIRECT and the block device supports atomic write, then FMODE_CAN_ATOMIC_WRITE is added to the file of the opened block device. This contains the work to expand atomic write support to filesystems, specifically ext4 and XFS. Currently, only support for writing exactly one filesystem block atomically is added. Since it's now possible to have filesystem block size > page size for XFS, it's possible to write 4K+ blocks atomically on x86" * tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: drop an obsolete comment in iomap_dio_bio_iter ext4: Do not fallback to buffered-io for DIO atomic write ext4: Support setting FMODE_CAN_ATOMIC_WRITE ext4: Check for atomic writes support in write iter ext4: Add statx support for atomic writes xfs: Support setting FMODE_CAN_ATOMIC_WRITE xfs: Validate atomic writes xfs: Support atomic write for statx fs: iomap: Atomic write support fs: Export generic_atomic_write_valid() block: Add bdev atomic write limits helpers fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() block/fs: Pass an iocb to generic_atomic_write_valid()	2024-11-18 11:30:09 -08:00
Linus Torvalds	70e7730c2a	vfs-6.13.misc -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcToAAKCRCRxhvAZXjc osL9AP948FFumJRC28gDJ4xp+X4eohNOfkgoEG8FTbF2zU6ulwD+O0pr26FqpFli pqlG+38UdATImpfqqWjPbb72sBYcfQg= =wLUh -----END PGP SIGNATURE----- Merge tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Fixup and improve NLM and kNFSD file lock callbacks Last year both GFS2 and OCFS2 had some work done to make their locking more robust when exported over NFS. Unfortunately, part of that work caused both NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send lock notifications to clients This in itself is not a huge problem because most NFS clients will still poll the server in order to acquire a conflicted lock It's important for NLM and kNFSD that they do not block their kernel threads inside filesystem's file_lock implementations because that can produce deadlocks. We used to make sure of this by only trusting that posix_lock_file() can correctly handle blocking lock calls asynchronously, so the lock managers would only setup their file_lock requests for async callbacks if the filesystem did not define its own lock() file operation However, when GFS2 and OCFS2 grew the capability to correctly handle blocking lock requests asynchronously, they started signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting posix_lock_file() was inadvertently dropped, so now most filesystems no longer produce lock notifications when exported over NFS Fix this by using an fop_flag which greatly simplifies the problem and grooms the way for future uses by both filesystems and lock managers alike - Add a sysctl to delete the dentry when a file is removed instead of making it a negative dentry Commit `681ce86235` ("vfs: Delete the associated dentry when deleting a file") introduced an unconditional deletion of the associated dentry when a file is removed. However, this led to performance regressions in specific benchmarks, such as ilebench.sum_operations/s, prompting a revert in commit `4a4be1ad3a` ("Revert "vfs: Delete the associated dentry when deleting a file""). This reintroduces the concept conditionally through a sysctl - Expand the statmount() system call: * Report the filesystem subtype in a new fs_subtype field to e.g., report fuse filesystem subtypes * Report the superblock source in a new sb_source field * Add a new way to return filesystem specific mount options in an option array that returns filesystem specific mount options separated by zero bytes and unescaped. This allows caller's to retrieve filesystem specific mount options and immediately pass them to e.g., fsconfig() without having to unescape or split them * Report security (LSM) specific mount options in a separate security option array. We don't lump them together with filesystem specific mount options as security mount options are generic and most users aren't interested in them The format is the same as for the filesystem specific mount option array - Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command - Optimize acl_permission_check() to avoid costly {g,u}id ownership checks if possible - Use smp_mb__after_spinlock() to avoid full smp_mb() in evict() - Add synchronous wakeup support for ep_poll_callback. Currently, epoll only uses wake_up() to wake up task. But sometimes there are epoll users which want to use the synchronous wakeup flag to give a hint to the scheduler, e.g., the Android binder driver. So add a wake_up_sync() define, and use wake_up_sync() when sync is true in ep_poll_callback() Fixes: - Fix kernel documentation for inode_insert5() and iget5_locked() - Annotate racy epoll check on file->f_ep - Make F_DUPFD_QUERY associative - Avoid filename buffer overrun in initramfs - Don't let statmount() return empty strings - Add a cond_resched() to dump_user_range() to avoid hogging the CPU - Don't query the device logical blocksize multiple times for hfsplus - Make filemap_read() check that the offset is positive or zero Cleanups: - Various typo fixes - Cleanup wbc_attach_fdatawrite_inode() - Add __releases annotation to wbc_attach_and_unlock_inode() - Add hugetlbfs tracepoints - Fix various vfs kernel doc parameters - Remove obsolete TODO comment from io_cancel() - Convert wbc_account_cgroup_owner() to take a folio - Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add() - Reorder struct posix_acl to save 8 bytes - Annotate struct posix_acl with __counted_by() - Replace one-element array with flexible array member in freevxfs - Use idiomatic atomic64_inc_return() in alloc_mnt_ns()" * tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits) statmount: retrieve security mount options vfs: make evict() use smp_mb__after_spinlock instead of smp_mb statmount: add flag to retrieve unescaped options fs: add the ability for statmount() to report the sb_source writeback: wbc_attach_fdatawrite_inode out of line writeback: add a __releases annoation to wbc_attach_and_unlock_inode fs: add the ability for statmount() to report the fs_subtype fs: don't let statmount return empty strings fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel() hfsplus: don't query the device logical block size multiple times freevxfs: Replace one-element array with flexible array member fs: optimize acl_permission_check() initramfs: avoid filename buffer overrun fs/writeback: convert wbc_account_cgroup_owner to take a folio acl: Annotate struct posix_acl with __counted_by() acl: Realign struct posix_acl to save 8 bytes epoll: Add synchronous wakeup support for ep_poll_callback coredump: add cond_resched() to dump_user_range mm/page-writeback.c: Fix comment of wb_domain_writeout_add() mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL ...	2024-11-18 09:35:30 -08:00
Christoph Hellwig	54079430c5	iomap: drop an obsolete comment in iomap_dio_bio_iter No more zone append special casing in iomap for quite a while. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241111121340.1390540-1-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-11-11 14:35:06 +01:00
John Garry	9e0933c21c	fs: iomap: Atomic write support Support direct I/O atomic writes by producing a single bio with REQ_ATOMIC flag set. Initially FSes (XFS) should only support writing a single FS block atomically. As with any atomic write, we should produce a single bio which covers the complete write length. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> [djwong: clarify a couple of things in the docs] Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2024-11-04 16:14:02 -08:00
Linus Torvalds	17fa6a5f93	vfs-6.12-rc6.iomap -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZyTGVAAKCRCRxhvAZXjc oltEAP9r8cWa3Tdv8DzMNWu/jezTUXoW/mX5Qe+c1L6faqj0WQD/dIVtBtG37Tfq 3Ci9F/GEWjKijtCQ5lwMGUq27jQJ1gk= =/0iA -----END PGP SIGNATURE----- Merge tag 'vfs-6.12-rc6.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull iomap fixes from Christian Brauner: "Fixes for iomap to prevent data corruption bugs in the fallocate unshare range implementation of fsdax and a small cleanup to turn iomap_want_unshare_iter() into an inline function" * tag 'vfs-6.12-rc6.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: iomap: turn iomap_want_unshare_iter into an inline function fsdax: dax_unshare_iter needs to copy entire blocks fsdax: remove zeroing code from dax_unshare_iter iomap: share iomap_unshare_iter predicate code with fsdax xfs: don't allocate COW extents when unsharing a hole	2024-11-01 07:45:00 -10:00
Pankaj Raghav	30dac24e14	fs/writeback: convert wbc_account_cgroup_owner to take a folio Most of the callers of wbc_account_cgroup_owner() are converting a folio to page before calling the function. wbc_account_cgroup_owner() is converting the page back to a folio to call mem_cgroup_css_from_folio(). Convert wbc_account_cgroup_owner() to take a folio instead of a page, and convert all callers to pass a folio directly except f2fs. Convert the page to folio for all the callers from f2fs as they were the only callers calling wbc_account_cgroup_owner() with a page. As f2fs is already in the process of converting to folios, these call sites might also soon be calling wbc_account_cgroup_owner() with a folio directly in the future. No functional changes. Only compile tested. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240926140121.203821-1-kernel@pankajraghav.com Acked-by: David Sterba <dsterba@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-28 13:26:54 +01:00
Christoph Hellwig	6db388585e	iomap: turn iomap_want_unshare_iter into an inline function iomap_want_unshare_iter currently sits in fs/iomap/buffered-io.c, which depends on CONFIG_BLOCK. It is also in used in fs/dax.c whіch has no such dependency. Given that it is a trivial check turn it into an inline in include/linux/iomap.h to fix the DAX && !BLOCK build. Fixes: `6ef6a0e821` ("iomap: share iomap_unshare_iter predicate code with fsdax") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241015041350.118403-1-hch@lst.de Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-21 17:01:01 +02:00
Christoph Hellwig	b784951662	iomap: move locking out of iomap_write_delalloc_release XFS (which currently is the only user of iomap_write_delalloc_release) already holds invalidate_lock for most zeroing operations. To be able to avoid a deadlock it needs to stop taking the lock, but doing so in iomap would leak XFS locking details into iomap. To avoid this require the caller to hold invalidate_lock when calling iomap_write_delalloc_release instead of taking it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-10-15 11:37:42 +02:00
Christoph Hellwig	caf0ea451d	iomap: remove iomap_file_buffered_write_punch_delalloc Currently iomap_file_buffered_write_punch_delalloc can be called from XFS either with the invalidate lock held or not. To fix this while keeping the locking in the file system and not the iomap library code we'll need to life the locking up into the file system. To prepare for that, open code iomap_file_buffered_write_punch_delalloc in the only caller, and instead export iomap_write_delalloc_release. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-10-15 11:37:42 +02:00
Christoph Hellwig	c0adf8c3a9	iomap: factor out a iomap_last_written_block helper Split out a pice of logic from iomap_file_buffered_write_punch_delalloc that is useful for all iomap_end implementations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-10-15 11:37:41 +02:00
Darrick J. Wong	6ef6a0e821	iomap: share iomap_unshare_iter predicate code with fsdax The predicate code that iomap_unshare_iter uses to decide if it's really needs to unshare a file range mapping should be shared with the fsdax version, because right now they're opencoded and inconsistent. Note that we simplify the predicate logic a bit -- we no longer allow unsharing of inline data mappings, but there aren't any filesystems that allow shared inline data currently. This is a fix in the sense that it should have been ported to fsdax. Fixes: `b53fdb215d` ("iomap: improve shared block detection in iomap_unshare_iter") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/172796813294.1131942.15762084021076932620.stgit@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-07 13:51:47 +02:00
Darrick J. Wong	a311a08a42	iomap: constrain the file range passed to iomap_file_unshare File contents can only be shared (i.e. reflinked) below EOF, so it makes no sense to try to unshare ranges beyond EOF. Constrain the file range parameters here so that we don't have to do that in the callers. Fixes: `5f4e5752a8` ("fs: add iomap_file_dirty") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20241002150213.GC21853@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-03 10:22:28 +02:00
Darrick J. Wong	f7a4874d97	iomap: don't bother unsharing delalloc extents If unshare encounters a delalloc reservation in the srcmap, that means that the file range isn't shared because delalloc reservations cannot be reflinked. Therefore, don't try to unshare them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20241002150040.GB21853@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-03 10:22:25 +02:00
Linus Torvalds	171754c380	vfs-6.12.blocksize -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEvwAKCRCRxhvAZXjc ohg3APwJWQnqFlBddcRl4yrPJ/cgcYSYAOdHb+E+blomSwdxcwEAmwsnLPNQOtw2 rxKvQfZqhVT437bl7RpPPZrHGxwTng8= =6v1r -----END PGP SIGNATURE----- Merge tag 'vfs-6.12.blocksize' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs blocksize updates from Christian Brauner: "This contains the vfs infrastructure as well as the xfs bits to enable support for block sizes (bs) larger than page sizes (ps) plus a few fixes to related infrastructure. There has been efforts over the last 16 years to enable enable Large Block Sizes (LBS), that is block sizes in filesystems where bs > page size. Through these efforts we have learned that one of the main blockers to supporting bs > ps in filesystems has been a way to allocate pages that are at least the filesystem block size on the page cache where bs > ps. Thanks to various previous efforts it is possible to support bs > ps in XFS with only a few changes in XFS itself. Most changes are to the page cache to support minimum order folio support for the target block size on the filesystem. A motivation for Large Block Sizes today is to support high-capacity (large amount of Terabytes) QLC SSDs where the internal Indirection Unit (IU) are typically greater than 4k to help reduce DRAM and so in turn cost and space. In practice this then allows different architectures to use a base page size of 4k while still enabling support for block sizes aligned to the larger IUs by relying on high order folios on the page cache when needed. It also allows to take advantage of the drive's support for atomics larger than 4k with buffered IO support in Linux. As described this year at LSFMM, supporting large atomics greater than 4k enables databases to remove the need to rely on their own journaling, so they can disable double buffered writes, which is a feature different cloud providers are already enabling through custom storage solutions" * tag 'vfs-6.12.blocksize' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (22 commits) Documentation: iomap: fix a typo iomap: remove the iomap_file_buffered_write_punch_delalloc return value iomap: pass the iomap to the punch callback iomap: pass flags to iomap_file_buffered_write_punch_delalloc iomap: improve shared block detection in iomap_unshare_iter iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release docs:filesystems: fix spelling and grammar mistakes in iomap design page filemap: fix htmldoc warning for mapping_align_index() iomap: make zero range flush conditional on unwritten mappings iomap: fix handling of dirty folios over unwritten extents iomap: add a private argument for iomap_file_buffered_write iomap: remove set_memor_ro() on zero page xfs: enable block size larger than page size support xfs: make the calculation generic in xfs_sb_validate_fsb_count() xfs: expose block size in stat xfs: use kvmalloc for xattr buffers iomap: fix iomap_dio_zero() for fs bs > system page size filemap: cap PTE range to be created to allowed zero fill in folio_map_range() mm: split a folio in minimum folio order chunks readahead: allocate folios with mapping_min_order in readahead ...	2024-09-20 17:53:17 -07:00
Christoph Hellwig	4bceb9ba05	iomap: remove the iomap_file_buffered_write_punch_delalloc return value iomap_file_buffered_write_punch_delalloc can only return errors if either the ->punch callback returned an error, or if someone changed the API of mapping_seek_hole_data to return a negative error code that is not -ENXIO. As the only instance of ->punch never returns an error, an such an error would be fatal anyway remove the entire error propagation and don't return an error code from iomap_file_buffered_write_punch_delalloc. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240910043949.3481298-6-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-10 11:14:15 +02:00
Christoph Hellwig	492f53758f	iomap: pass the iomap to the punch callback XFS will need to look at the flags in the iomap structure, so pass it down all the way to the callback. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240910043949.3481298-5-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-10 11:14:15 +02:00
Christoph Hellwig	11596dc3df	iomap: pass flags to iomap_file_buffered_write_punch_delalloc To fix short write error handling, We'll need to figure out what operation iomap_file_buffered_write_punch_delalloc is called for. Pass the flags argument on to it, and reorder the argument list to match that of ->iomap_end so that the compiler only has to add the new punch argument to the end of it instead of reshuffling the registers. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240910043949.3481298-4-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-10 11:14:14 +02:00
Christoph Hellwig	b53fdb215d	iomap: improve shared block detection in iomap_unshare_iter Currently iomap_unshare_iter relies on the IOMAP_F_SHARED flag to detect blocks to unshare. This is reasonable, but IOMAP_F_SHARED is also useful for the file system to do internal book keeping for out of place writes. XFS used to that, until it got removed in commit `72a048c105` ("xfs: only set IOMAP_F_SHARED when providing a srcmap to a write") because unshare for incorrectly unshare such blocks. Add an extra safeguard by checking the explicitly provided srcmap instead of the fallback to the iomap for valid data, as that catches the case where we'd just copy from the same place we'd write to easily, allowing to reinstate setting IOMAP_F_SHARED for all XFS writes that go to the COW fork. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240910043949.3481298-3-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-10 11:13:43 +02:00
Christoph Hellwig	7a9d43eace	iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release When direct I/O completions invalidates the page cache it holds neither the i_rwsem nor the invalidate_lock so it can be racing with iomap_write_delalloc_release. If the search for the end of the region that contains data returns the start offset we hit such a race and just need to look for the end of the newly created hole instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240910043949.3481298-2-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-10 11:13:43 +02:00
Brian Foster	7d9b474ee4	iomap: make zero range flush conditional on unwritten mappings iomap_zero_range() flushes pagecache to mitigate consistency problems with dirty pagecache and unwritten mappings. The flush is unconditional over the entire range because checking pagecache state after mapping lookup is racy with writeback and reclaim. There are ways around this using iomap's mapping revalidation mechanism, but this is not supported by all iomap based filesystems and so is not a generic solution. There is another way around this limitation that is good enough to filter the flush for most cases in practice. If we check for dirty pagecache over the target range (instead of unconditionally flush), we can keep track of whether the range was dirty before lookup and defer the flush until/unless we see a combination of dirty cache backed by an unwritten mapping. We don't necessarily know whether the dirty cache was backed by the unwritten maping or some other (written) part of the range, but the impliciation of a false positive here is a spurious flush and thus relatively harmless. Note that we also flush for hole mappings because iomap_zero_range() is used for partial folio zeroing in some cases. For example, if a folio straddles EOF on a sub-page FSB size fs, the post-eof portion is hole-backed and dirtied/written via mapped write, and then i_size increases before writeback can occur (which otherwise zeroes the post-eof portion of the EOF folio), then the folio becomes inconsistent with disk until reclaimed. A flush in this case executes partial zeroing from writeback, and iomap knows that there is otherwise no I/O to submit for hole backed mappings. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20240830145634.138439-3-bfoster@redhat.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-03 15:01:24 +02:00
Brian Foster	c5c810b94c	iomap: fix handling of dirty folios over unwritten extents The iomap zero range implementation doesn't properly handle dirty pagecache over unwritten mappings. It skips such mappings as if they were pre-zeroed. If some part of an unwritten mapping is dirty in pagecache from a previous write, the data in cache should be zeroed as well. Instead, the data is left in cache and creates a stale data exposure problem if writeback occurs sometime after the zero range. Most callers are unaffected by this because the higher level filesystem contexts that call zero range typically perform a filemap flush of the target range for other reasons. A couple contexts that don't otherwise need to flush are write file size extension and truncate in XFS. The former path is currently susceptible to the stale data exposure problem and the latter performs a flush specifically to work around it. This is clearly inconsistent and incomplete. As a first step toward correcting behavior, lift the XFS workaround to iomap_zero_range() and unconditionally flush the range before the zero range operation proceeds. While this appears to be a bit of a big hammer, most all users already do this from calling context save for the couple of exceptions noted above. Future patches will optimize or elide this flush while maintaining functional correctness. Fixes: `ae259a9c85` ("fs: introduce iomap infrastructure") Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20240830145634.138439-2-bfoster@redhat.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-03 15:01:24 +02:00
Josef Bacik	31754ea6cb	iomap: add a private argument for iomap_file_buffered_write In order to switch fuse over to using iomap for buffered writes we need to be able to have the struct file for the original write, in case we have to read in the page to make it uptodate. Handle this by using the existing private field in the iomap_iter, and add the argument to iomap_file_buffered_write. This will allow us to pass the file in through the iomap buffered write path, and is flexible for any other file systems needs. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/7f55c7c32275004ba00cddf862d970e6e633f750.1724755651.git.josef@toxicpanda.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-03 15:01:23 +02:00
Luis Chamberlain	d1dd75dcda	iomap: remove set_memor_ro() on zero page Stephen reported a boot failure on ppc power8 system where set_memor_ro() on the new zero page failed [0]. Christophe Leroy further clarifies we can't use this on on linear memory on ppc, and so instead of special casing this just for PowerPC [2] remove the call as suggested by Darrick. [0] https://lore.kernel.org/all/20240826175931.1989f99e@canb.auug.org.au/T/#u [1] https://lore.kernel.org/all/b0fe75b4-c1bb-47f7-a7c3-2534b31c1780@csgroup.eu/ [2] https://lore.kernel.org/all/ZszrJkFOpiy5rCma@bombadil.infradead.org/ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20240826212632.2098685-1-mcgrof@kernel.org Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-03 15:01:23 +02:00
Pankaj Raghav	10553a9165	iomap: fix iomap_dio_zero() for fs bs > system page size iomap_dio_zero() will pad a fs block with zeroes if the direct IO size < fs block size. iomap_dio_zero() has an implicit assumption that fs block size < page_size. This is true for most filesystems at the moment. If the block size > page size, this will send the contents of the page next to zero page(as len > PAGE_SIZE) to the underlying block device, causing FS corruption. iomap is a generic infrastructure and it should not make any assumptions about the fs block size and the page size of the system. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240822135018.1931258-7-kernel@pankajraghav.com Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-02 16:19:43 +02:00
Matthew Wilcox (Oracle)	97edbc02b2	buffer: Convert block_write_end() to take a folio All callers now have a folio, so pass it in instead of converting from a folio to a page and back to a folio again. Saves a call to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-07 11:31:59 +02:00
Linus Torvalds	4f5e249ec0	vfs-6.11.iomap -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEHLQAKCRCRxhvAZXjc ot3sAP9TBUM+vzUcQ5SVcUnSX+y3dhOGYnquORBbRc/Y6AzLMAEAu3TcsvdoaWfy 6ImUaju6iLqy9cCY3uDlNmJR16E4IgE= =Bwpy -----END PGP SIGNATURE----- Merge tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull iomap updates from Christian Brauner: "This contains some minor work for the iomap subsystem: - Add documentation on the design of iomap and how to port to it - Optimize iomap_read_folio() - Bring back the change to iomap_write_end() to no increase i_size. This is accompanied by a change to xfs to reserve blocks for truncating large realtime inodes to avoid exposing stale data when iomap_write_end() stops increasing i_size" * tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: don't increase i_size in iomap_write_end() xfs: reserve blocks for truncating large realtime inode Documentation: the design of iomap and how to port iomap: Optimize iomap_read_folio	2024-07-15 13:28:14 -07:00
Linus Torvalds	aff31330e0	vfs-6.11.pg_error -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEGSgAKCRCRxhvAZXjc opvwAQCBfq5sxn/P34MNheHAVJOkQlozaflLIRM/CRN60HXV3AEAiph0RJBszvDu VhJ9VZ21zypvpS34enBfPKp1ZmyHPwI= =hNqR -----END PGP SIGNATURE----- Merge tag 'vfs-6.11.pg_error' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull PG_error removal updates from Christian Brauner: "This contains work to remove almost all remaining users of PG_error from filesystems and filesystem helper libraries. An additional patch will be coming in via the jfs tree which tests the PG_error bit. Afterwards nothing will be testing it anymore and it's safe to remove all places which set or clear the PG_error bit. The goal is to fully remove PG_error by the next merge window" * tag 'vfs-6.11.pg_error' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: buffer: Remove calls to set and clear the folio error flag iomap: Remove calls to set and clear folio error flag vboxsf: Convert vboxsf_read_folio() to use a folio ufs: Remove call to set the folio error flag romfs: Convert romfs_read_folio() to use a folio reiserfs: Remove call to folio_set_error() orangefs: Remove calls to set/clear the error flag nfs: Remove calls to folio_set_error jffs2: Remove calls to set/clear the folio error flag hostfs: Convert hostfs_read_folio() to use a folio isofs: Convert rock_ridge_symlink_read_folio to use a folio hpfs: Convert hpfs_symlink_read_folio to use a folio efs: Convert efs_symlink_read_folio to use a folio cramfs: Convert cramfs_read_folio to use a folio coda: Convert coda_symlink_filler() to use folio_end_read() befs: Convert befs_symlink_read_folio() to use folio_end_read()	2024-07-15 11:08:14 -07:00
Zhang Yi	602f09f402	iomap: don't increase i_size in iomap_write_end() This reverts commit '0841ea4a3b41 ("iomap: keep on increasing i_size in iomap_write_end()")'. After xfs could zero out the tail blocks aligned to the allocation unitsize and convert the tail blocks to unwritten for realtime inode on truncate down, it couldn't expose any stale data when unaligned truncate down realtime inodes, so we could keep on keeping i_size for IOMAP_UNSHARE and IOMAP_ZERO in iomap_write_end(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240618142112.1315279-3-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-06-19 15:58:28 +02:00
Ritesh Harjani (IBM)	af4eb6f46f	iomap: Optimize iomap_read_folio iomap_readpage_iter() handles "uptodate blocks" and "not uptodate blocks" within a folio separately. This makes iomap_read_folio() to call into ->iomap_begin() to request for extent mapping even though it might already have an extent which is not fully processed. This happens when we either have a large folio or with bs < ps. In these cases we can have sub blocks which can be uptodate (say for e.g. due to previous writes). With iomap_read_folio_iter(), this is handled more efficiently by not calling ->iomap_begin() call until all the sub blocks with the current folio are processed. iomap_read_folio_iter() handles multiple sub blocks within a given folio but it's implementation logic is similar to how iomap_readahead_iter() handles multiple folios within a single mapped extent. Both of them iterate over a given range of folio/mapped extent and call iomap_readpage_iter() for reading. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/92ae9f3333c9a7e66214568d08f45664261c899c.1715067055.git.ritesh.list@gmail.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> cc: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-06-19 15:58:19 +02:00
Ritesh Harjani (IBM)	f5ceb1bbc9	iomap: Fix iomap_adjust_read_range for plen calculation If the extent spans the block that contains i_size, we need to handle both halves separately so that we properly zero data in the page cache for blocks that are entirely outside of i_size. But this is needed only when i_size is within the current folio under processing. "orig_pos + length > isize" can be true for all folios if the mapped extent length is greater than the folio size. That is making plen to break for every folio instead of only the last folio. So use orig_plen for checking if "orig_pos + orig_plen > isize". Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/a32e5f9a4fcfdb99077300c4020ed7ae61d6e0f9.1715067055.git.ritesh.list@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> cc: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-06-05 17:27:03 +02:00
Zhang Yi	0841ea4a3b	iomap: keep on increasing i_size in iomap_write_end() Commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write operation")' breaks xfs with realtime device on generic/561, the problem is when unaligned truncate down a xfs realtime inode with rtextsize > 1 fs block, xfs only zero out the EOF block but doesn't zero out the tail blocks that aligned to rtextsize, so if we don't increase i_size in iomap_write_end(), it could expose stale data after we do an append write beyond the aligned EOF block. xfs should zero out the tail blocks when truncate down, but before we finish that, let's fix the issue by just revert the changes in iomap_write_end(). Fixes: `943bc0882c` ("iomap: don't increase i_size if it's not a write operation") Reported-by: Chandan Babu R <chandanbabu@kernel.org> Link: https://lore.kernel.org/linux-xfs/0b92a215-9d9b-3788-4504-a520778953c2@huaweicloud.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240603112222.2109341-1-yi.zhang@huaweicloud.com Tested-by: Chandan Babu R <chandanbabu@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-06-05 17:23:39 +02:00
Matthew Wilcox (Oracle)	1f56eedf7f	iomap: Remove calls to set and clear folio error flag The folio error flag is not checked anywhere, so we can remove the calls to set and clear it. Cc: Christian Brauner <brauner@kernel.org> Cc: linux-xfs@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240530202110.2653630-16-willy@infradead.org Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-05-31 12:31:42 +02:00
Xu Yang	4e527d5841	iomap: fault in smaller chunks for non-large folio mappings Since commit (`5d8edfb900` "iomap: Copy larger chunks from userspace"), iomap will try to copy in larger chunks than PAGE_SIZE. However, if the mapping doesn't support large folio, only one page of maximum 4KB will be created and 4KB data will be writen to pagecache each time. Then, next 4KB will be handled in next iteration. This will cause potential write performance problem. If chunk is 2MB, total 512 pages need to be handled finally. During this period, fault_in_iov_iter_readable() is called to check iov_iter readable validity. Since only 4KB will be handled each time, below address space will be checked over and over again: start end - buf, buf+2MB buf+4KB, buf+2MB buf+8KB, buf+2MB ... buf+2044KB buf+2MB Obviously the checking size is wrong since only 4KB will be handled each time. So this will get a correct chunk to let iomap work well in non-large folio case. With this change, the write speed will be stable. Tested on ARM64 device. Before: - dd if=/dev/zero of=/dev/sda bs=400K count=10485 (334 MB/s) - dd if=/dev/zero of=/dev/sda bs=800K count=5242 (278 MB/s) - dd if=/dev/zero of=/dev/sda bs=1600K count=2621 (204 MB/s) - dd if=/dev/zero of=/dev/sda bs=2200K count=1906 (170 MB/s) - dd if=/dev/zero of=/dev/sda bs=3000K count=1398 (150 MB/s) - dd if=/dev/zero of=/dev/sda bs=4500K count=932 (139 MB/s) After: - dd if=/dev/zero of=/dev/sda bs=400K count=10485 (339 MB/s) - dd if=/dev/zero of=/dev/sda bs=800K count=5242 (330 MB/s) - dd if=/dev/zero of=/dev/sda bs=1600K count=2621 (332 MB/s) - dd if=/dev/zero of=/dev/sda bs=2200K count=1906 (333 MB/s) - dd if=/dev/zero of=/dev/sda bs=3000K count=1398 (333 MB/s) - dd if=/dev/zero of=/dev/sda bs=4500K count=932 (333 MB/s) Fixes: `5d8edfb900` ("iomap: Copy larger chunks from userspace") Cc: stable@vger.kernel.org Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240521114939.2541461-2-xu.yang_2@nxp.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-05-24 13:34:07 +02:00
Linus Torvalds	ff9a79307f	Kbuild updates for v6.10 - Avoid 'constexpr', which is a keyword in C23 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of 'dt_binding_check' - Fix weak references to avoid GOT entries in position-independent code generation - Convert the last use of 'optional' property in arch/sh/Kconfig - Remove support for the 'optional' property in Kconfig - Remove support for Clang's ThinLTO caching, which does not work with the .incbin directive - Change the semantics of $(src) so it always points to the source directory, which fixes Makefile inconsistencies between upstream and downstream - Fix 'make tar-pkg' for RISC-V to produce a consistent package - Provide reasonable default coverage for objtool, sanitizers, and profilers - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc. - Remove the last use of tristate choice in drivers/rapidio/Kconfig - Various cleanups and fixes in Kconfig -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmZFlGcVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsG8voQALC8NtFpduWVfLRj2Qg6Ll/xf1vX 2igcTJEOFHkeqXLGoT8dTDKLEipUBUvKyguPq66CGwVTe2g6zy/nUSXeVtFrUsIa msLTi8FqhqUo5lodNvGMRf8qqmuqcvnXoiQwIocF92jtsFy14bhiFY+n4HfcFNjj GOKwqBZYQUwY/VVb090efc7RfS9c7uwABJSBelSoxg3AGZriwjGy7Pw5aSKGgVYi inqL1eR6qwPP6z7CgQWM99soP+zwybFZmnQrsD9SniRBI4rtAat8Ih5jQFaSUFUQ lk2w0NQBRFN88/uR2IJ2GWuIlQ74WeJ+QnCqVuQ59tV5zw90wqSmLzngfPD057Dv JjNuhk0UyXVtpIg3lRtd4810ppNSTe33b9OM4O2H846W/crju5oDRNDHcflUXcwm Rmn5ho1rb5QVzDVejJbgwidnUInSgJ9PZcvXQ/RJVZPhpgsBzAY9pQexG1G3hviw y9UDrt6KP6bF9tHjmolmtdIes9Pj0c4dN6/Rdj4HS4hIQ/GDar0tnwvOvtfUctNL orJlBsA6GeMmDVXKkR0ytOCWRYqWWbyt8g70RVKQJfuHX7/hGyAQPaQ2/u4mQhC2 aevYfbNJMj0VDfGz81HDBKFtkc5n+Ite8l157dHEl2LEabkOkRdNVcn7SNbOvZmd ZCSnZ31h7woGfNho =D5B/ -----END PGP SIGNATURE----- Merge tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Avoid 'constexpr', which is a keyword in C23 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of 'dt_binding_check' - Fix weak references to avoid GOT entries in position-independent code generation - Convert the last use of 'optional' property in arch/sh/Kconfig - Remove support for the 'optional' property in Kconfig - Remove support for Clang's ThinLTO caching, which does not work with the .incbin directive - Change the semantics of $(src) so it always points to the source directory, which fixes Makefile inconsistencies between upstream and downstream - Fix 'make tar-pkg' for RISC-V to produce a consistent package - Provide reasonable default coverage for objtool, sanitizers, and profilers - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc. - Remove the last use of tristate choice in drivers/rapidio/Kconfig - Various cleanups and fixes in Kconfig * tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits) kconfig: use sym_get_choice_menu() in sym_check_prop() rapidio: remove choice for enumeration kconfig: lxdialog: remove initialization with A_NORMAL kconfig: m/nconf: merge two item_add_str() calls kconfig: m/nconf: remove dead code to display value of bool choice kconfig: m/nconf: remove dead code to display children of choice members kconfig: gconf: show checkbox for choice correctly kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal Makefile: remove redundant tool coverage variables kbuild: provide reasonable defaults for tool coverage modules: Drop the .export_symbol section from the final modules kconfig: use menu_list_for_each_sym() in sym_check_choice_deps() kconfig: use sym_get_choice_menu() in conf_write_defconfig() kconfig: add sym_get_choice_menu() helper kconfig: turn defaults and additional prompt for choice members into error kconfig: turn missing prompt for choice members into error kconfig: turn conf_choice() into void function kconfig: use linked list in sym_set_changed() kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED kconfig: gconf: remove debug code ...	2024-05-18 12:39:20 -07:00
Masahiro Yamada	b1992c3772	kbuild: use $(src) instead of $(srctree)/$(src) for source directory Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for checked-in source files. It is merely a convention without any functional difference. In fact, $(obj) and $(src) are exactly the same, as defined in scripts/Makefile.build: src := $(obj) When the kernel is built in a separate output directory, $(src) does not accurately reflect the source directory location. While Kbuild resolves this discrepancy by specifying VPATH=$(srctree) to search for source files, it does not cover all cases. For example, when adding a header search path for local headers, -I$(srctree)/$(src) is typically passed to the compiler. This introduces inconsistency between upstream and downstream Makefiles because $(src) is used instead of $(srctree)/$(src) for the latter. To address this inconsistency, this commit changes the semantics of $(src) so that it always points to the directory in the source tree. Going forward, the variables used in Makefiles will have the following meanings: $(obj) - directory in the object tree $(src) - directory in the source tree (changed by this commit) $(objtree) - the top of the kernel object tree $(srctree) - the top of the kernel source tree Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced with $(src). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>	2024-05-10 04:34:52 +09:00
Zhang Yi	e1f453d433	iomap: do some small logical cleanup in buffered write Since iomap_write_end() can never return a partial write length, the comparison between written, copied and bytes becomes useless, just merge them with the unwritten branch. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-10-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-25 14:23:54 +02:00
Zhang Yi	815f4b633b	iomap: make iomap_write_end() return a boolean For now, we can make sure iomap_write_end() always return 0 or copied bytes, so instead of return written bytes, convert to return a boolean to indicate the copied bytes have been written to the pagecache. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-9-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-25 14:23:54 +02:00
Zhang Yi	1a61d74932	iomap: use a new variable to handle the written bytes in iomap_write_iter() In iomap_write_iter(), the status variable used to receive the return value from iomap_write_end() is confusing, replace it with a new written variable to represent the written bytes in each cycle, no logic changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-8-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-25 14:23:54 +02:00
Zhang Yi	943bc0882c	iomap: don't increase i_size if it's not a write operation Increase i_size in iomap_zero_range() and iomap_unshare_iter() is not needed, the caller should handle it. Especially, when truncate partial block, we should not increase i_size beyond the new EOF here. It doesn't affect xfs and gfs2 now because they set the new file size after zero out, it doesn't matter that a transient increase in i_size, but it will affect ext4 because it set file size before truncate. So move the i_size updating logic to iomap_write_iter(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-7-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-25 14:23:54 +02:00
Zhang Yi	89c6c1d91a	iomap: drop the write failure handles when unsharing and zeroing Unsharing and zeroing can only happen within EOF, so there is never a need to perform posteof pagecache truncation if write begin fails, also partial write could never theoretically happened from iomap_write_end(), so remove both of them. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-6-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-25 14:23:53 +02:00
Christoph Hellwig	0fac04e4e0	iomap: convert iomap_writepages to writeack_iter This removes one indirect function call per folio, and adds type safety by not casting through a void pointer. Based on a patch by Matthew Wilcox. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240412061614.1511629-1-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-15 14:25:55 +02:00
Christian Brauner	86835c39e0	vfs-6.9.rw_hint -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZdcKwQAKCRCRxhvAZXjc oldXAP4uzKixPvJeJmmuLs8Yl2X4g4SnxXFoLwMjCOxGSH1DWQD+Oj0nGs81lIKm iLCZwk09JzfVEat/6KVmkjiqLLTwNgw= =TmTQ -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZeYHOAAKCRCRxhvAZXjc opwvAP0fqxfEAS04/MNdYSf0dA5GMr8v+8RBablWtkVuOMMbRQD/RMFJKXK02afq B4YUemRHtYETdbV69+yzninHy8y4gQQ= =ThqF -----END PGP SIGNATURE----- Merge tag 'vfs-6.9.rw_hint' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull write hint fix from Christian Brauner: UFS devices are widely used in mobile applications, e.g. in smartphones. UFS vendors need data lifetime information to achieve good performance. Providing data lifetime information to UFS devices can result in up to 40% lower write amplification. Hence this patch series that restores the bi_write_hint member in struct bio. After this patch series has been merged, patches that implement data lifetime support in the SCSI disk (sd) driver will be sent to the Linux kernel SCSI maintainer. The following changes are included in this patch series: - Improvements for the F_GET_RW_HINT and F_SET_RW_HINT fcntls. - Move enum rw_hint into a new header file. - Support F_SET_RW_HINT for block devices to make it easy to test data lifetime support. - Restore the bio.bi_write_hint member and restore support in the VFS layer and also in the block layer for data lifetime information. The shell script that has been used to test the patch series combined with the SCSI patches is available at the end of this cover letter. * tag 'vfs-6.9.rw_hint' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: block, fs: Restore the per-bio/request data lifetime fields fs: Propagate write hints to the struct block_device inode fs: Move enum rw_hint into a new header file fs: Split fcntl_rw_hint() fs: Verify write lifetime constants at compile time fs: Fix rw_hint validation Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-03-04 18:35:21 +01:00
Kassey Li	dcd04ea587	iomap: Add processed for iomap_iter processed: The number of bytes processed by the body in the most recent iteration, or a negative errno. 0 causes the iteration to stop. The processed is useful to check when the loop breaks. Signed-off-by: Kassey Li <quic_yingangl@quicinc.com> Link: https://lore.kernel.org/r/20240219021138.3481763-1-quic_yingangl@quicinc.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-02-21 08:37:50 +01:00
Zhang Yi	54943abce0	iomap: add pos and dirty_len into trace_iomap_writepage_map Since commit fd07e0aa23c4 ("iomap: map multiple blocks at a time"), we could map multi-blocks once a time, and the dirty_len indicates the expected map length, map_len won't large than it. The pos and dirty_len means the dirty range that should be mapped to write, add them into trace_iomap_writepage_map() could be more useful for debug. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240220115759.3445025-1-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-02-21 08:35:20 +01:00
Bart Van Assche	449813515d	block, fs: Restore the per-bio/request data lifetime fields Restore support for passing data lifetime information from filesystems to block drivers. This patch reverts commit `b179c98f76` ("block: Remove request.write_hint") and commit `c75e707fe1` ("block: remove the per-bio/request write hint"). This patch does not modify the size of struct bio because the new bi_write_hint member fills a hole in struct bio. pahole reports the following for struct bio on an x86_64 system with this patch applied: /* size: 112, cachelines: 2, members: 20 / / sum members: 110, holes: 1, sum holes: 2 / / last cacheline: 48 bytes */ Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240202203926.2478590-7-bvanassche@acm.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-02-06 14:31:05 +01:00
Christoph Hellwig	19871b5c7a	iomap: pass the length of the dirty region to ->map_blocks Let the file system know how much dirty data exists at the passed in offset. This allows file systems to allocate the right amount of space that actually is written back if they can't eagerly convert (e.g. because they don't support unwritten extents). Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231207072710.176093-15-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-02-01 14:20:13 +01:00
Christoph Hellwig	30deff8531	iomap: map multiple blocks at a time The ->map_blocks interface returns a valid range for writeback, but we still call back into it for every block, which is a bit inefficient. Change iomap_writepage_map to use the valid range in the map until the end of the folio or the dirty range inside the folio instead of calling back into every block. Note that the range is not used over folio boundaries as we need to be able to check the mapping sequence count under the folio lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231207072710.176093-14-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-02-01 14:20:13 +01:00
Christoph Hellwig	410bb2ce61	iomap: submit ioends immediately Currently the writeback code delays submitting fill ioends until we reach the end of the folio. The reason for that is that otherwise the end I/O handler could clear the writeback bit before we've even finished submitting all I/O for the folio. Add a bias to ifs->write_bytes_pending while we are submitting I/O for a folio so that it never reaches zero until all I/O is completed to prevent the early writeback bit clearing, and remove the now superfluous submit_list. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231207072710.176093-13-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-02-01 14:20:13 +01:00
Christoph Hellwig	f525152a0f	iomap: factor out a iomap_writepage_map_block helper Split the loop body that calls into the file system to map a block and add it to the ioend into a separate helper to prefer for refactoring of the surrounding code. Note that this was the only place in iomap_writepage_map that could return an error, so include the call to ->discard_folio into the new helper as that will help to avoid code duplication in the future. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231207072710.176093-12-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-02-01 14:20:12 +01:00
Christoph Hellwig	6b865d6530	iomap: only call mapping_set_error once for each failed bio Instead of clling mapping_set_error once per folio, only do that once per bio, and consolidate all the writeback error handling code in iomap_finish_ioend. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231207072710.176093-11-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-02-01 14:20:12 +01:00

1 2 3 4 5 ...

366 Commits