It's a pretty small update including mostly minor bug fixes in zoned storage
along with the large section support.
Enhancement:
- add support for FS_IOC_GETFSSYSFSPATH
- enable atgc dynamically if conditions are met
- use new ioprio Macro to get ckpt thread ioprio level
- remove unreachable lazytime mount option parsing
Bug fix:
- fix null reference error when checking end of zone
- fix start segno of large section
- fix to cover read extent cache access with lock
- don't dirty inode for readonly filesystem
- allocate a new section if curseg is not the first seg in its zone
- only fragment segment in the same section
- truncate preallocated blocks in f2fs_file_open()
- fix to avoid use SSR allocate when do defragment
- fix to force buffered IO on inline_data inode
And, it includes some minor code clean-ups, and sanity checks.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmagGLkACgkQQBSofoJI
UNLsvA//U1u2hr+VEmSIxZ+CcM8vBM7wmbuggdUikEW0uj07YpvovLikifV7p6kK
00p/GsqIqNRsVcTxRI9wBTPiltJRei/w6K3EXnSGKgTPtq1QMSv/GKiBUUaYsRu0
F6W5AqouTquDZz61/ULhMc7WvWqUIZ1m4QX/DMEUGPSnQ2+yIsnz/PT4ZXaKBH7K
lIh4WiFAyKO6/UWftcGmnvPiqj4YvqFOhLLV/fgF/VY8IVcENrDH+8+SJM2NtT0F
6gT0bN2Jscc8o43ejo6dlwc7+0qhmH7H2IOCC1XSYGCsveUYgqgKgpBP4ryKjZvt
LrbYKaL+auGuJMcLYCG/6IDPl5xkJo3SuRE7YnJdeTNc3InC6BUr17pkmU8n5ib4
xKSeH2XQXk/nu3l9srtKb87Zdwjr90GgvjEZwsCTe+6ihjJ7SGWfpvVLhm3pHale
SHPSLaVGqTlqdrNLtfhtNEg6xcvUVxTPbqzoCAmS6onEZfv8BldtQDSea0Tuw7UG
Ic4AbfJ/gVCKyCDw/QiV0B1n8GHsVIhlBXss2/xEuO2/2Pso8YFIAXCyH0kBXIN2
0/VesfguJLBIGyyFZ2M5AGZehr5s1n2IThe+qGjeoHfNQz7Br+xBTc25VpowUenC
nET3UoAmUkLFrItDMMqJbJ8DwW/Idei+YH/xnDZSKkz5rgHclsg=
=4m67
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"A pretty small update including mostly minor bug fixes in zoned
storage along with the large section support.
Enhancements:
- add support for FS_IOC_GETFSSYSFSPATH
- enable atgc dynamically if conditions are met
- use new ioprio Macro to get ckpt thread ioprio level
- remove unreachable lazytime mount option parsing
Bug fixes:
- fix null reference error when checking end of zone
- fix start segno of large section
- fix to cover read extent cache access with lock
- don't dirty inode for readonly filesystem
- allocate a new section if curseg is not the first seg in its zone
- only fragment segment in the same section
- truncate preallocated blocks in f2fs_file_open()
- fix to avoid use SSR allocate when do defragment
- fix to force buffered IO on inline_data inode
And some minor code clean-ups and sanity checks"
* tag 'f2fs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (26 commits)
f2fs: clean up addrs_per_{inode,block}()
f2fs: clean up F2FS_I()
f2fs: use meta inode for GC of COW file
f2fs: use meta inode for GC of atomic file
f2fs: only fragment segment in the same section
f2fs: fix to update user block counts in block_operations()
f2fs: remove unreachable lazytime mount option parsing
f2fs: fix null reference error when checking end of zone
f2fs: fix start segno of large section
f2fs: remove redundant sanity check in sanity_check_inode()
f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid
f2fs: fix to use mnt_{want,drop}_write_file replace file_{start,end}_wrtie
f2fs: clean up set REQ_RAHEAD given rac
f2fs: enable atgc dynamically if conditions are met
f2fs: fix to truncate preallocated blocks in f2fs_file_open()
f2fs: fix to cover read extent cache access with lock
f2fs: fix return value of f2fs_convert_inline_inode()
f2fs: use new ioprio Macro to get ckpt thread ioprio level
f2fs: fix to don't dirty inode for readonly filesystem
f2fs: fix to avoid use SSR allocate when do defragment
...
Introduce a new help addrs_per_page() to wrap common code
from addrs_per_inode() and addrs_per_block() for cleanup.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In case of the COW file, new updates and GC writes are already
separated to page caches of the atomic file and COW file. As some cases
that use the meta inode for GC, there are some race issues between a
foreground thread and GC thread.
To handle them, we need to take care when to invalidate and wait
writeback of GC pages in COW files as the case of using the meta inode.
Also, a pointer from the COW inode to the original inode is required to
check the state of original pages.
For the former, we can solve the problem by using the meta inode for GC
of COW files. Then let's get a page from the original inode in
move_data_block when GCing the COW file to avoid race condition.
Fixes: 3db1de0e58 ("f2fs: change the current atomic write way")
Cc: stable@vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The page cache of the atomic file keeps new data pages which will be
stored in the COW file. It can also keep old data pages when GCing the
atomic file. In this case, new data can be overwritten by old data if a
GC thread sets the old data page as dirty after new data page was
evicted.
Also, since all writes to the atomic file are redirected to COW inodes,
GC for the atomic file is not working well as below.
f2fs_gc(gc_type=FG_GC)
- select A as a victim segment
do_garbage_collect
- iget atomic file's inode for block B
move_data_page
f2fs_do_write_data_page
- use dn of cow inode
- set fio->old_blkaddr from cow inode
- seg_freed is 0 since block B is still valid
- goto gc_more and A is selected as victim again
To solve the problem, let's separate GC writes and updates in the atomic
file by using the meta inode for GC writes.
Fixes: 3db1de0e58 ("f2fs: change the current atomic write way")
Cc: stable@vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now atgc can only be enabled when umounted->mounted device
even related conditions have reached. If the device has not
be umounted->mounted for a long time, atgc can not work.
So enable atgc dynamically when atgc_age_threshold is less than
elapsed_time and ATGC mount option is on.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
chenyuwen reports a f2fs bug as below:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000011
fscrypt_set_bio_crypt_ctx+0x78/0x1e8
f2fs_grab_read_bio+0x78/0x208
f2fs_submit_page_read+0x44/0x154
f2fs_get_read_data_page+0x288/0x5f4
f2fs_get_lock_data_page+0x60/0x190
truncate_partial_data_page+0x108/0x4fc
f2fs_do_truncate_blocks+0x344/0x5f0
f2fs_truncate_blocks+0x6c/0x134
f2fs_truncate+0xd8/0x200
f2fs_iget+0x20c/0x5ac
do_garbage_collect+0x5d0/0xf6c
f2fs_gc+0x22c/0x6a4
f2fs_disable_checkpoint+0xc8/0x310
f2fs_fill_super+0x14bc/0x1764
mount_bdev+0x1b4/0x21c
f2fs_mount+0x20/0x30
legacy_get_tree+0x50/0xbc
vfs_get_tree+0x5c/0x1b0
do_new_mount+0x298/0x4cc
path_mount+0x33c/0x5fc
__arm64_sys_mount+0xcc/0x15c
invoke_syscall+0x60/0x150
el0_svc_common+0xb8/0xf8
do_el0_svc+0x28/0xa0
el0_svc+0x24/0x84
el0t_64_sync_handler+0x88/0xec
It is because inode.i_crypt_info is not initialized during below path:
- mount
- f2fs_fill_super
- f2fs_disable_checkpoint
- f2fs_gc
- f2fs_iget
- f2fs_truncate
So, let's relocate truncation of preallocated blocks to f2fs_file_open(),
after fscrypt_file_open().
Fixes: d4dd19ec1e ("f2fs: do not expose unwritten blocks to user by DIO")
Reported-by: chenyuwen <yuwen.chen@xjmz.com>
Closes: https://lore.kernel.org/linux-kernel/20240517085327.1188515-1-yuwen.chen@xjmz.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
inode can be fuzzed, so it can has F2FS_INLINE_DATA flag and valid
i_blocks/i_nid value, this patch supports to do extra sanity check
to detect such corrupted state.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Keeping it as qstr avoids the unnecessary conversion in f2fs_match
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
[eugen.hristev@collabora.com: port to 6.10-rc1 and minor changes]
Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>
Link: https://lore.kernel.org/r/20240606073353.47130-3-eugen.hristev@collabora.com
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
When building without CONFIG_F2FS_FAULT_INJECTION, there is a warning
from each file that includes f2fs.h because the stub for
f2fs_build_fault_attr() is missing inline:
In file included from fs/f2fs/segment.c:21:
fs/f2fs/f2fs.h:4605:12: warning: 'f2fs_build_fault_attr' defined but not used [-Wunused-function]
4605 | static int f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned long rate,
| ^~~~~~~~~~~~~~~~~~~~~
Add the missing inline to resolve all of the warnings for this
configuration.
Fixes: 4ed886b187 ("f2fs: check validation of fault attrs in f2fs_build_fault_attr()")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If inc_valid_block_count() can not allocate all requested blocks,
it needs to release block count in .total_valid_block_count and
resevation blocks in inode.
Fixes: 5460749487 ("f2fs: compress: fix to avoid inconsistence bewteen i_blocks and dnode")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
- It missed to check validation of fault attrs in parse_options(),
let's fix to add check condition in f2fs_build_fault_attr().
- Use f2fs_build_fault_attr() in __sbi_store() to clean up code.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
type of f2fs_inode.i_gc_failures, f2fs_inode_info.i_gc_failures, and
f2fs_sb_info.gc_pin_file_threshold is __le16, unsigned int, and u64,
so it will cause truncation during comparison and persistence.
Unifying variable of these three variables to unsigned short, and
add an upper boundary limitation for gc_pin_file_threshold.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
After commit 3db1de0e58 ("f2fs: change the current atomic write way"),
we removed all GC_FAILURE_ATOMIC usage, let's change i_gc_failures[]
array to i_pin_failure for cleanup.
Meanwhile, let's define i_current_depth and i_gc_failures as union
variable due to they won't be valid at the same time.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If active_log is not 6, we never use WARM_DATA segment, let's
avoid allocating WARM_DATA segment for direct IO.
Signed-off-by: Yunlei He <heyunlei@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Convert f2fs_read_inline_data() to use folio and related
functionality, and also convert its caller to use folio.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This reverts commit 930e260763 ("f2fs: remove obsolete whint_mode"), as we
decide to pass write hints to the disk.
Cc: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If the max open zones of zoned devices are less than
the active logs of F2FS, the device may error due to
insufficient zone resources when multiple active logs
are being written at the same time.
Signed-off-by: Wenjie Qi <qwjhust@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Support .shutdown callback in f2fs_sops, then, it can be called to
shut down the file system when underlying block device is marked dead.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In this round, there are a number of updates on mainly two areas: Zoned block
device support and Per-file compression. For example, we've found several issues
to support Zoned block device especially having large sections regarding to GC
and file pinning used for Android devices. In compression side, we've fixed many
corner race conditions that had broken the design assumption.
Enhancement:
- Support file pinning for Zoned block device having large section
- Enhance the data recovery after sudden power cut on Zoned block device
- Add more error injection cases to easily detect the kernel panics
- add a proc entry show the entire disk layout
- Improve various error paths paniced by BUG_ON in block allocation and GC
- support SEEK_DATA and SEEK_HOLE for compression files
Bug fix:
- fix to avoid use-after-free issue in f2fs_filemap_fault
- fix some race conditions to break the atomic write design assumption
- fix to truncate meta inode pages forcely
- resolve various per-file compression issues wrt the space management and
compression policies
- fix some swap-related bugs
In addition, we removed deprecated codes such as io_bits and heap_allocation,
and also fixed minor error handling routines with neat debugging messages.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmX4gS0ACgkQQBSofoJI
UNLmgBAAg4mvbWjmJ5VbXs4zGLOgLRJYcY1sZRO5Ufg4LhWzoGRxL1Dru+TELw0t
1Ck2EQvP91XZ5weA5AZOfWbxcijy4+8L3P8L7ohOShudfACci0wQsx6IaUUWWylC
ILA4+DkovpZrlu6th12Gj9QAM6TN9gdy3V1VLT5O/KmE1x6Pekwp2hQoIvVJRH5L
I3KxOf5fTe3oWLvEN6m7yCz/8qGqz8+w0ae90UG0fqi0wVEuZJ99zsVPnuhu6uBo
riFm2A6ra0I/JqoPyqn2QM6ApItM867ULo9EoyQVgq56Q1w31ENOJXsU9N7N4Wxt
olgujH1SijkWk9ni57iKtMhR68e3Rs+pVsuNFmJuOPq0HASoggB66QRrVvCgM9JG
z3D//CB2ONtX2XiKJMiTcX9VqIqrMw6L1eVxEZu0P96C3CS70MoBU69mdSR9Og2S
5nQXja3yzFhdk3thp6+wAJ3I04ZQkf3qoHZB+0chU2Xl1pV+5NIkBgBsSw8g/TY3
EIHMfK+TX0SBSNCvkUDEJ+Z8ZRID6tcbAquTSsBr6wxB+F9mq7onEvI8O7xwyH9W
DU8xhymOE2QUoluNtyW7ww6HK913ripXIenI9LaYJnuj0XeDAcMIoPsgR7AGU5UG
hshvirFdUdWRMTfXxNNUrvhOWI0qurQSVx+VV6Qb62DGqR5ofOw=
=Qpvy
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs update from Jaegeuk Kim:
"In this round, there are a number of updates on mainly two areas:
Zoned block device support and Per-file compression. For example,
we've found several issues to support Zoned block device especially
having large sections regarding to GC and file pinning used for
Android devices. In compression side, we've fixed many corner race
conditions that had broken the design assumption.
Enhancements:
- Support file pinning for Zoned block device having large section
- Enhance the data recovery after sudden power cut on Zoned block
device
- Add more error injection cases to easily detect the kernel panics
- add a proc entry show the entire disk layout
- Improve various error paths paniced by BUG_ON in block allocation
and GC
- support SEEK_DATA and SEEK_HOLE for compression files
Bug fixes:
- avoid use-after-free issue in f2fs_filemap_fault
- fix some race conditions to break the atomic write design
assumption
- fix to truncate meta inode pages forcely
- resolve various per-file compression issues wrt the space
management and compression policies
- fix some swap-related bugs
In addition, we removed deprecated codes such as io_bits and
heap_allocation, and also fixed minor error handling routines with
neat debugging messages"
* tag 'f2fs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits)
f2fs: fix to avoid use-after-free issue in f2fs_filemap_fault
f2fs: truncate page cache before clearing flags when aborting atomic write
f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag
f2fs: prevent atomic write on pinned file
f2fs: fix to handle error paths of {new,change}_curseg()
f2fs: unify the error handling of f2fs_is_valid_blkaddr
f2fs: zone: fix to remove pow2 check condition for zoned block device
f2fs: fix to truncate meta inode pages forcely
f2fs: compress: fix reserve_cblocks counting error when out of space
f2fs: compress: relocate some judgments in f2fs_reserve_compress_blocks
f2fs: add a proc entry show disk layout
f2fs: introduce SEGS_TO_BLKS/BLKS_TO_SEGS for cleanup
f2fs: fix to check return value of f2fs_gc_range
f2fs: fix to check return value __allocate_new_segment
f2fs: fix to do sanity check in update_sit_entry
f2fs: fix to reset fields for unloaded curseg
f2fs: clean up new_curseg()
f2fs: relocate f2fs_precache_extents() in f2fs_swap_activate()
f2fs: fix blkofs_end correctly in f2fs_migrate_blocks()
f2fs: ro: don't start discard thread for readonly image
...
In f2fs_update_inode, i_size of the atomic file isn't updated until
FI_ATOMIC_COMMITTED flag is set. When committing atomic write right
after the writeback of the inode, i_size of the raw inode will not be
updated. It can cause the atomicity corruption due to a mismatch between
old file size and new data.
To prevent the problem, let's mark inode dirty for FI_ATOMIC_COMMITTED
Atomic write thread Writeback thread
__writeback_single_inode
write_inode
f2fs_update_inode
- skip i_size update
f2fs_ioc_commit_atomic_write
f2fs_commit_atomic_write
set_inode_flag(inode, FI_ATOMIC_COMMITTED)
f2fs_do_sync_file
f2fs_fsync_node_pages
- skip f2fs_update_inode since the inode is clean
Fixes: 3db1de0e58 ("f2fs: change the current atomic write way")
Cc: stable@vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmXx5kwACgkQnJ2qBz9k
QNmZowf/UlGJ1rmQFFhoodn3SyK48tQjOZ23Ygx6v9FZiLMuQ3b1k0kWKmwM4lZb
mtRriCm+lPO9Yp/Sflz+jn8S51b/2bcTXiPV4w2Y4ZIun41wwggV7rWPnTCHhu94
rGEPu/SNSBdpxWGv43BKHSDl4XolsGbyusQKBbKZtftnrpIf0y2OnyEXSV91Vnlh
KM/XxzacBD4/3r4KCljyEkORWlIIn2+gdZf58sKtxLKvnfCIxjB+BF1e0gOWgmNQ
e/pVnzbAHO3wuavRlwnrtA+ekBYQiJq7T61yyYI8zpeSoLHmwvPoKSsZP+q4BTvV
yrcVCbGp3uZlXHD93U3BOfdqS0xBmg==
=84Q4
-----END PGP SIGNATURE-----
Merge tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, isofs, udf, and quota updates from Jan Kara:
"A lot of material this time:
- removal of a lot of GFP_NOFS usage from ext2, udf, quota (either it
was legacy or replaced with scoped memalloc_nofs_*() API)
- removal of BUG_ONs in quota code
- conversion of UDF to the new mount API
- tightening quota on disk format verification
- fix some potentially unsafe use of RCU pointers in quota code and
annotate everything properly to make sparse happy
- a few other small quota, ext2, udf, and isofs fixes"
* tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (26 commits)
udf: remove SLAB_MEM_SPREAD flag usage
quota: remove SLAB_MEM_SPREAD flag usage
isofs: remove SLAB_MEM_SPREAD flag usage
ext2: remove SLAB_MEM_SPREAD flag usage
ext2: mark as deprecated
udf: convert to new mount API
udf: convert novrs to an option flag
MAINTAINERS: add missing git address for ext2 entry
quota: Detect loops in quota tree
quota: Properly annotate i_dquot arrays with __rcu
quota: Fix rcu annotations of inode dquot pointers
isofs: handle CDs with bad root inode but good Joliet root directory
udf: Avoid invalid LVID used on mount
quota: Fix potential NULL pointer dereference
quota: Drop GFP_NOFS instances under dquot->dq_lock and dqio_sem
quota: Set nofs allocation context when acquiring dqio_sem
ext2: Remove GFP_NOFS use in ext2_xattr_cache_insert()
ext2: Drop GFP_NOFS use in ext2_get_blocks()
ext2: Drop GFP_NOFS allocation from ext2_init_block_alloc_info()
udf: Remove GFP_NOFS allocation in udf_expand_file_adinicb()
...
{new,change}_curseg() may return error in some special cases,
error handling should be did in their callers, and this will also
facilitate subsequent error path expansion in {new,change}_curseg().
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Below race case can cause data corruption:
Thread A GC thread
- gc_data_segment
- ra_data_block
- locked meta_inode page
- f2fs_inplace_write_data
- invalidate_mapping_pages
: fail to invalidate meta_inode page
due to lock failure or dirty|writeback
status
- f2fs_submit_page_bio
: write last dirty data to old blkaddr
- move_data_block
- load old data from meta_inode page
- f2fs_submit_page_write
: write old data to new blkaddr
Because invalidate_mapping_pages() will skip invalidating page which
has unclear status including locked, dirty, writeback and so on, so
we need to use truncate_inode_pages_range() instead of
invalidate_mapping_pages() to make sure meta_inode page will be dropped.
Fixes: 6aa58d8ad2 ("f2fs: readahead encrypted block during GC")
Fixes: e3b49ea368 ("f2fs: invalidate META_MAPPING before IPU/DIO write")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4DwAKCRCRxhvAZXjc
ooTRAQDRI6Qz6wJym5Yblta8BScMGbt/SgrdgkoCvT6y83MtqwD+Nv/AZQzi3A3l
9NdULtniW1reuCYkc8R7dYM8S+yAwAc=
=Y1qX
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull block handle updates from Christian Brauner:
"Last cycle we changed opening of block devices, and opening a block
device would return a bdev_handle. This allowed us to implement
support for restricting and forbidding writes to mounted block
devices. It was accompanied by converting and adding helpers to
operate on bdev_handles instead of plain block devices.
That was already a good step forward but ultimately it isn't necessary
to have special purpose helpers for opening block devices internally
that return a bdev_handle.
Fundamentally, opening a block device internally should just be
equivalent to opening files. So now all internal opens of block
devices return files just as a userspace open would. Instead of
introducing a separate indirection into bdev_open_by_*() via struct
bdev_handle bdev_file_open_by_*() is made to just return a struct
file. Opening and closing a block device just becomes equivalent to
opening and closing a file.
This all works well because internally we already have a pseudo fs for
block devices and so opening block devices is simple. There's a few
places where we needed to be careful such as during boot when the
kernel is supposed to mount the rootfs directly without init doing it.
Here we need to take care to ensure that we flush out any asynchronous
file close. That's what we already do for opening, unpacking, and
closing the initramfs. So nothing new here.
The equivalence of opening and closing block devices to regular files
is a win in and of itself. But it also has various other advantages.
We can remove struct bdev_handle completely. Various low-level helpers
are now private to the block layer. Other helpers were simply
removable completely.
A follow-up series that is already reviewed build on this and makes it
possible to remove bdev->bd_inode and allows various clean ups of the
buffer head code as well. All places where we stashed a bdev_handle
now just stash a file and use simple accessors to get to the actual
block device which was already the case for bdev_handle"
* tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
block: remove bdev_handle completely
block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access
bdev: remove bdev pointer from struct bdev_handle
bdev: make struct bdev_handle private to the block layer
bdev: make bdev_{release, open_by_dev}() private to block layer
bdev: remove bdev_open_by_path()
reiserfs: port block device access to file
ocfs2: port block device access to file
nfs: port block device access to files
jfs: port block device access to file
f2fs: port block device access to files
ext4: port block device access to file
erofs: port device access to file
btrfs: port device access to file
bcachefs: port block device access to file
target: port block device access to file
s390: port block device access to file
nvme: port block device access to file
block2mtd: port device access to files
bcache: port block device access to files
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4UQAKCRCRxhvAZXjc
ouERAQDg63R9s3bKmUgGqngf9cfr//VCTE+WVARwOUTdn2iDbwEA1IME7X1kL/Vz
EdhEjyqO6xom+ao/Vqxe0XIDNz70vgs=
=8RdE
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.9.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull iomap updates from Christian Brauner:
- Restore read-write hints in struct bio through the bi_write_hint
member for the sake of UFS devices in mobile applications. This can
result in up to 40% lower write amplification in UFS devices. The
patch series that builds on this will be coming in via the SCSI
maintainers (Bart)
- Overhaul the iomap writeback code. Afterwards ->map_blocks() is able
to map multiple blocks at once as long as they're in the same folio.
This reduces CPU usage for buffered write workloads on e.g., xfs on
systems with lots of cores (Christoph)
- Record processed bytes in iomap_iter() trace event (Kassey)
- Extend iomap_writepage_map() trace event after Christoph's
->map_block() changes to map mutliple blocks at once (Zhang)
* tag 'vfs-6.9.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
iomap: Add processed for iomap_iter
iomap: add pos and dirty_len into trace_iomap_writepage_map
block, fs: Restore the per-bio/request data lifetime fields
fs: Propagate write hints to the struct block_device inode
fs: Move enum rw_hint into a new header file
fs: Split fcntl_rw_hint()
fs: Verify write lifetime constants at compile time
fs: Fix rw_hint validation
iomap: pass the length of the dirty region to ->map_blocks
iomap: map multiple blocks at a time
iomap: submit ioends immediately
iomap: factor out a iomap_writepage_map_block helper
iomap: only call mapping_set_error once for each failed bio
iomap: don't chain bios
iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend
iomap: clean up the iomap_alloc_ioend calling convention
iomap: move all remaining per-folio logic into iomap_writepage_map
iomap: factor out a iomap_writepage_handle_eof helper
iomap: move the PF_MEMALLOC check to iomap_writepages
iomap: move the io_folios field out of struct iomap_ioend
...
__allocate_new_segment may return error when get_new_segment
fails, so its caller should check its return value.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Use it to simulate no free segment case during block allocation.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If CONFIG_F2FS_CHECK_FS is off, and for very rare corner case that
we run out of free segment, we should not panic kernel, instead,
let's handle such error correctly in its caller.
Signed-off-by: Chao Yu <chao@kernel.org>
Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit 093749e296 ("f2fs: support age threshold based garbage
collection") added this declaration, but w/ definition, delete
it.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There are very similar codes in inc_valid_block_count() and
inc_valid_node_count() which is used for available user block
count calculation.
This patch introduces a new helper get_available_block_count()
to include those common codes, and used it to clean up codes.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Support file pinning with conventional storage area for zoned devices
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
generic/700 - output mismatch (see /media/fstests/results//generic/700.out.bad)
--- tests/generic/700.out 2023-03-28 10:40:42.735529223 +0000
+++ /media/fstests/results//generic/700.out.bad 2024-02-06 04:37:56.000000000 +0000
@@ -1,2 +1,4 @@
QA output created by 700
+/mnt/scratch_f2fs/f1: security.selinux: No such attribute
+/mnt/scratch_f2fs/f2: security.selinux: No such attribute
Silence is golden
...
(Run 'diff -u /media/fstests/tests/generic/700.out /media/fstests/results//generic/700.out.bad' to see the entire diff)
HINT: You _MAY_ be missing kernel fix:
70b589a37e xfs: add selinux labels to whiteout inodes
Previously, it missed to create selinux labels during whiteout inode
initialization, fix this issue.
Fixes: 7e01e7ad74 ("f2fs: support RENAME_WHITEOUT")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Let's deprecate an unused io_bits feature to save CPU cycles and memory.
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Dquots pointed to from i_dquot arrays in inodes are protected by
dquot_srcu. Annotate them as such and change .get_dquots callback to
return properly annotated pointer to make sparse happy.
Fixes: b9ba6f94b2 ("quota: remove dqptr_sem")
Signed-off-by: Jan Kara <jack@suse.cz>
Move enum rw_hint into a new header file to prepare for using this data
type in the block layer. Add the attribute __packed to reduce the space
occupied by instances of this data type from four bytes to one byte.
Change the data type of i_write_hint from u8 into enum rw_hint.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Chao Yu <chao@kernel.org> # for the F2FS part
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240202203926.2478590-5-bvanassche@acm.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
During recovery, if FAULT_BLOCK is on, it is possible that
f2fs_reserve_new_block() will return -ENOSPC during recovery,
then it may trigger panic.
Also, if fault injection rate is 1 and only FAULT_BLOCK fault
type is on, it may encounter deadloop in loop of block reservation.
Let's change as below to fix these issues:
- remove bug_on() to avoid panic.
- limit the loop count of block reservation to avoid potential
deadloop.
Fixes: 956fa1ddc1 ("f2fs: fix to check return value of f2fs_reserve_new_block()")
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch supports using printk_ratelimited() in f2fs_printk(), and
wrap ratelimited f2fs_printk() into f2fs_{err,warn,info}_ratelimited(),
then, use these new helps to clean up codes.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We will encounter below inconsistent status when FAULT_BLKADDR type
fault injection is on.
Info: checkpoint state = d6 : nat_bits crc fsck compacted_summary orphan_inodes sudden-power-off
[ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1c100 has i_blocks: 000000c0, but has 191 blocks
[FIX] (fsck_chk_inode_blk:1260) --> [0x1c100] i_blocks=0x000000c0 -> 0xbf
[FIX] (fsck_chk_inode_blk:1269) --> [0x1c100] i_compr_blocks=0x00000026 -> 0x27
[ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1cadb has i_blocks: 0000002f, but has 46 blocks
[FIX] (fsck_chk_inode_blk:1260) --> [0x1cadb] i_blocks=0x0000002f -> 0x2e
[FIX] (fsck_chk_inode_blk:1269) --> [0x1cadb] i_compr_blocks=0x00000011 -> 0x12
[ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1c62c has i_blocks: 00000002, but has 1 blocks
[FIX] (fsck_chk_inode_blk:1260) --> [0x1c62c] i_blocks=0x00000002 -> 0x1
After we inject fault into f2fs_is_valid_blkaddr() during truncation,
a) it missed to increase @nr_free or @valid_blocks
b) it can cause in blkaddr leak in truncated dnode
Which may cause inconsistent status.
This patch separates FAULT_BLKADDR_CONSISTENCE from FAULT_BLKADDR,
and rename FAULT_BLKADDR to FAULT_BLKADDR_VALIDITY
so that we can:
a) use FAULT_BLKADDR_CONSISTENCE in f2fs_truncate_data_blocks_range()
to simulate inconsistent issue independently, then it can verify fsck
repair flow.
b) FAULT_BLKADDR_VALIDITY fault will not cause any inconsistent status,
we can just use it to check error path handling in kernel side.
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
verify_blkaddr() will trigger panic once we inject fault into
f2fs_is_valid_blkaddr(), fix to remove this unnecessary f2fs_bug_on().
Fixes: 18792e64c8 ("f2fs: support fault injection for f2fs_is_valid_blkaddr()")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In reserve_compress_blocks(), we update blkaddrs of dnode in prior to
inc_valid_block_count(), it may cause inconsistent status bewteen
i_blocks and blkaddrs once inc_valid_block_count() fails.
To fix this issue, it needs to reverse their invoking order.
Fixes: c75488fb4d ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If data block in compressed cluster is not persisted with metadata
during checkpoint, after SPOR, the data may be corrupted, let's
guarantee to write compressed page by checkpoint.
Fixes: 4c8ff7095b ("f2fs: support data compression")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>