linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-22 04:02:20 +00:00

Author	SHA1	Message	Date
Gao Xiang	14c2d97265	erofs: use get_tree_bdev_flags() to avoid misleading messages Users can pass in an arbitrary source path for the proper type of a mount then without "Can't lookup blockdev" error message. Reported-by: Allison Karlitskaya <allison.karlitskaya@redhat.com> Closes: https://lore.kernel.org/r/CAOYeF9VQ8jKVmpy5Zy9DNhO6xmWSKMB-DO8yvBB0XvBE7=3Ugg@mail.gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241009033151.2334888-2-hsiangkao@linux.alibaba.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-21 14:30:27 +02:00
Linus Torvalds	63fa605041	Changes since last update: - Make sure only regular inodes can be used for file-backed mounts; - Two minor codebase cleanups. -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmcNHqARHHhpYW5nQGtl cm5lbC5vcmcACgkQUXZn5Zlu5qo1XA/+MFbobJ4bWxJQKnouLlCiFQ5C1xEFbVn2 HasHLfrMIcdz/n/S3Ib4Ayi+9W0zM2Ekq9EG+fuOBqjZP17+EOj3e7OPtVVPNwx0 u2GbD9zNCliZg9PigCfPO+6oImt6l/Mytmx+7bELqbMywAy7JNCNesJuyycsTcja o1I3dNNUZdppilohXPIENTRLjBlOuGBaZdUXDih0LqB+Pb0jgXTP6JfD88h1MLFw xBbhqQ1A/GgyESfsMpZFn2xvFIocLBCIAdAehi9M1AiEwCTjGkTZ66WW3H6Es/Zp vcC9KjHJoGGCXxZf8mnoQHQo/WqQuNUPc2BVf9iExzCo0nwRArcTbAu5Bskqg0LF c+a7FrrxhODz8ioxOOiMUqG4b3/qGkzlk6w5a/t7IRrmFtmcXmPWZ14aI8qpy7o/ CW3iPUoF/zEsmmFvOgJtHwy3g+bC8KhDvz3fqFIDSSMjSKjqb4cPYSe/L5MyhwED wmLgp1uYjEyR0uuqqUp93FEYIbHuO5HpPRT5crLczRIoYn7bXRhjNWLCTmzlLqrj yDAQsrngK99BQ7g0FTQ/OV9si/HRRGsusZmCkeCb6KnRNIvml4X9/WXKc1ioOFk/ 3MSaxlQlTXzCCctjVCDNn9GfD/yR1cXu2sUpGSEnP1ssLG4ARyXGVfoeSw7gJ4xn C5lm9SOmkzU= =0gFG -----END PGP SIGNATURE----- Merge tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "The main one fixes a syzbot issue due to the invalid inode type out of file-backed mounts. The others are minor cleanups without actual logic changes. Summary: - Make sure only regular inodes can be used for file-backed mounts - Two minor codebase cleanups" * tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: get rid of kaddr in `struct z_erofs_maprecorder` erofs: get rid of z_erofs_try_to_claim_pcluster() erofs: ensure regular inodes for file-backed mounts	2024-10-14 11:12:09 -07:00
Gao Xiang	ae54567eaa	erofs: get rid of kaddr in `struct z_erofs_maprecorder` `kaddr` becomes useless after switching to metabuf. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241010235830.1535616-1-hsiangkao@linux.alibaba.com	2024-10-11 13:36:58 +08:00
Gao Xiang	2402082e53	erofs: get rid of z_erofs_try_to_claim_pcluster() Just fold it into the caller for simplicity. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241010090420.405871-1-hsiangkao@linux.alibaba.com	2024-10-11 13:36:58 +08:00
Gao Xiang	416a8b2c02	erofs: ensure regular inodes for file-backed mounts Only regular inodes are allowed for file-backed mounts, not directories (as seen in the original syzbot case) or special inodes. Also ensure that .read_folio() is implemented on the underlying fs for the primary device. Fixes: `fb17675026` ("erofs: add file-backed mount support") Reported-by: syzbot+001306cd9c92ce0df23f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/00000000000011bdde0622498ee3@google.com Tested-by: syzbot+001306cd9c92ce0df23f@syzkaller.appspotmail.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240917130803.32418-1-hsiangkao@linux.alibaba.com	2024-10-11 13:36:41 +08:00
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Gao Xiang	025497e1d1	erofs: reject inodes with negative i_size Negative i_size is never supported, although crafted images with inodes having negative i_size will NOT lead to security issues in our current codebase: The following image can verify this (gzip+base64 encoded): H4sICCmk4mYAA3Rlc3QuaW1nAGNgGAWjYBSMVPDo4dcH3jP2aTED2TwMKgxMUHHNJY/SQDQX LxcDIw3tZwXit44MDNpQ/n8gQJZ/vxjijosPuSyZ0DUDgQqcZoKzVYFsDShbHeh6PT29ktTi Eqz2g/y2pBFiLxDMh4lhs5+W4TAKRsEoGAWjYBSMglEwCkYBPQAAS2DbowAQAAA= Mark as bad inodes for such corrupted inodes explicitly. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240912083538.3011860-1-hsiangkao@linux.alibaba.com	2024-09-12 23:00:09 +08:00
Gao Xiang	7c3ca1838a	erofs: restrict pcluster size limitations Error out if {en,de}encoded size of a pcluster is unsupported: Maximum supported encoded size (of a pcluster): 1 MiB Maximum supported decoded size (of a pcluster): 12 MiB Users can still choose to use supported large configurations (e.g., for archival purposes), but there may be performance penalties in low-memory scenarios compared to smaller pclusters. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240912074156.2925394-1-hsiangkao@linux.alibaba.com	2024-09-12 23:00:09 +08:00
Chunhai Guo	79f504a2cd	erofs: allocate more short-lived pages from reserved pool first This patch aims to allocate bvpages and short-lived compressed pages from the reserved pool first. After applying this patch, there are three benefits. 1. It reduces the page allocation time. The bvpages and short-lived compressed pages account for about 4% of the pages allocated from the system in the multi-app launch benchmarks [1]. It reduces the page allocation time accordingly and lowers the likelihood of blockage by page allocation in low memory scenarios. 2. The pages in the reserved pool will be allocated on demand. Currently, bvpages and short-lived compressed pages are short-lived pages allocated from the system, and the pages in the reserved pool all originate from short-lived pages. Consequently, the number of reserved pool pages will increase to z_erofs_rsv_nrpages over time. With this patch, all short-lived pages are allocated from the reserved pool first, so the number of reserved pool pages will only increase when there are not enough pages. Thus, even if z_erofs_rsv_nrpages is set to a large number for specific reasons, the actual number of reserved pool pages may remain low as per demand. In the multi-app launch benchmarks [1], z_erofs_rsv_nrpages is set at 256, while the number of reserved pool pages remains below 64. 3. When erofs cache decompression is disabled (EROFS_ZIP_CACHE_DISABLED), all pages will only be allocated from the reserved pool for erofs. This will significantly reduce the memory pressure from erofs. [1] For additional details on the multi-app launch benchmarks, please refer to commit `0f6273ab46` ("erofs: add a reserved buffer pool for lz4 decompression"). Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240906121110.3701889-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-09-12 22:59:49 +08:00
Gao Xiang	2349d2fa02	erofs: sunset unneeded NOFAILs With iterative development, our codebase can now deal with compressed buffer misses properly if both in-place I/O and compressed buffer allocation fail. Note that if readahead fails (with non-uptodate folios), the original request will then fall back to synchronous read, and `.read_folio()` should return appropriate errnos; otherwise -EIO will be passed to user space, which is unexpected. To simplify rarely encountered failure paths, a mimic decompression will be just used. Before that, failure reasons are recorded in compressed_bvecs[] and they also act as placeholders to avoid in-place pages. They will be parsed just before decompression and then pass back to `.read_folio()`. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240905084732.2684515-1-hsiangkao@linux.alibaba.com	2024-09-12 20:26:43 +08:00
Hongzhen Luo	8bdb6a8393	erofs: simplify erofs_map_blocks_flatmode() Get rid of redundant variables (nblocks, offset) and a dead branch (!tailendpacking). Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240905030339.1474396-1-hongzhen@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-09-10 15:27:14 +08:00
Yiyang Wu	53d514b970	erofs: refactor read_inode calling convention Refactor out the iop binding behavior out of the erofs_fill_symlink and move erofs_buf into the erofs_read_inode, so that erofs_fill_inode can only deal with inode operation bindings and can be decoupled from metabuf operations. This results in better calling conventions. Note that after this patch, we do not need erofs_buf and ofs as parameters any more when calling erofs_read_inode as all the data operations are now included in itself. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/all/20240425222847.GN2118490@ZenIV/ Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240902093412.509083-1-toolmanp@tlmp.cc Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-09-10 15:27:11 +08:00
Yiyang Wu	b1bbb9a637	erofs: use kmemdup_nul in erofs_fill_symlink Remove open coding in erofs_fill_symlink. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/all/20240425222847.GN2118490@ZenIV Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc> Link: https://lore.kernel.org/r/20240902083147.450558-2-toolmanp@tlmp.cc Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-09-10 15:27:11 +08:00
Gao Xiang	0d442ce0b3	erofs: mark experimental fscache backend deprecated Although fscache is still described as "General Filesystem Caching" for network filesystems and other things such as ISO9660 filesystems, it has actually become a part of netfslib recently, which was unexpected at the time when "EROFS over fscache" proposed (2021) since EROFS is entirely a disk filesystem and the dependency is redundant. Mark it deprecated and it will be removed after "fanotify pre-content hooks" lands, which will provide the same functionality for EROFS. Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240830032840.3783206-4-hsiangkao@linux.alibaba.com	2024-09-10 15:27:11 +08:00
Gao Xiang	283213718f	erofs: support compressed inodes for fileio Use pseudo bios just like the previous fscache approach since merged bio_vecs can be filled properly with unique interfaces. Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240830032840.3783206-3-hsiangkao@linux.alibaba.com	2024-09-10 15:27:09 +08:00
Gao Xiang	ce63cb62d7	erofs: support unencoded inodes for fileio Since EROFS only needs to handle read requests in simple contexts, Just directly use vfs_iocb_iter_read() for data I/Os. Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240905093031.2745929-1-hsiangkao@linux.alibaba.com	2024-09-10 15:26:36 +08:00
Gao Xiang	fb17675026	erofs: add file-backed mount support It actually has been around for years: For containers and other sandbox use cases, there will be thousands (and even more) of authenticated (sub)images running on the same host, unlike OS images. Of course, all scenarios can use the same EROFS on-disk format, but bdev-backed mounts just work well for OS images since golden data is dumped into real block devices. However, it's somewhat hard for container runtimes to manage and isolate so many unnecessary virtual block devices safely and efficiently [1]: they just look like a burden to orchestrators and file-backed mounts are preferred indeed. There were already enough attempts such as Incremental FS, the original ComposeFS and PuzzleFS acting in the same way for immutable fses. As for current EROFS users, ComposeFS, containerd and Android APEXs will be directly benefited from it. On the other hand, previous experimental feature "erofs over fscache" was once also intended to provide a similar solution (inspired by Incremental FS discussion [2]), but the following facts show file-backed mounts will be a better approach: - Fscache infrastructure has recently been moved into new Netfslib which is an unexpected dependency to EROFS really, although it originally claims "it could be used for caching other things such as ISO9660 filesystems too." [3] - It takes an unexpectedly long time to upstream Fscache/Cachefiles enhancements. For example, the failover feature took more than one year, and the deamonless feature is still far behind now; - Ongoing HSM "fanotify pre-content hooks" [4] together with this will perfectly supersede "erofs over fscache" in a simpler way since developers (mainly containerd folks) could leverage their existing caching mechanism entirely in userspace instead of strictly following the predefined in-kernel caching tree hierarchy. After "fanotify pre-content hooks" lands upstream to provide the same functionality, "erofs over fscache" will be removed then (as an EROFS internal improvement and EROFS will not have to bother with on-demand fetching and/or caching improvements anymore.) [1] https://github.com/containers/storage/pull/2039 [2] https://lore.kernel.org/r/CAOQ4uxjbVxnubaPjVaGYiSwoGDTdpWbB=w_AeM6YM=zVixsUfQ@mail.gmail.com [3] https://docs.kernel.org/filesystems/caching/fscache.html [4] https://lore.kernel.org/r/cover.1723670362.git.josef@toxicpanda.com Closes: https://github.com/containers/composefs/issues/144 Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240830032840.3783206-1-hsiangkao@linux.alibaba.com	2024-09-10 15:26:35 +08:00
Gao Xiang	9e2f9d34dd	erofs: handle overlapped pclusters out of crafted images properly syzbot reported a task hang issue due to a deadlock case where it is waiting for the folio lock of a cached folio that will be used for cache I/Os. After looking into the crafted fuzzed image, I found it's formed with several overlapped big pclusters as below: Ext: logical offset \| length : physical offset \| length 0: 0.. 16384 \| 16384 : 151552.. 167936 \| 16384 1: 16384.. 32768 \| 16384 : 155648.. 172032 \| 16384 2: 32768.. 49152 \| 16384 : 537223168.. 537239552 \| 16384 ... Here, extent 0/1 are physically overlapped although it's entirely _impossible_ for normal filesystem images generated by mkfs. First, managed folios containing compressed data will be marked as up-to-date and then unlocked immediately (unlike in-place folios) when compressed I/Os are complete. If physical blocks are not submitted in the incremental order, there should be separate BIOs to avoid dependency issues. However, the current code mis-arranges z_erofs_fill_bio_vec() and BIO submission which causes unexpected BIO waits. Second, managed folios will be connected to their own pclusters for efficient inter-queries. However, this is somewhat hard to implement easily if overlapped big pclusters exist. Again, these only appear in fuzzed images so let's simply fall back to temporary short-lived pages for correctness. Additionally, it justifies that referenced managed folios cannot be truncated for now and reverts part of commit `2080ca1ed3` ("erofs: tidy up `struct z_erofs_bvec`") for simplicity although it shouldn't be any difference. Reported-by: syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com Reported-by: syzbot+de04e06b28cfecf2281c@syzkaller.appspotmail.com Reported-by: syzbot+c8c8238b394be4a1087d@syzkaller.appspotmail.com Tested-by: syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/0000000000002fda01061e334873@google.com Fixes: `8e6c8fa9f2` ("erofs: enable big pcluster feature") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240910070847.3356592-1-hsiangkao@linux.alibaba.com	2024-09-10 15:26:15 +08:00
Sandeep Dhavale	3fc3e45fcd	erofs: fix error handling in z_erofs_init_decompressor If we get a failure at the first decompressor init (i = 0), the clean up while loop could enter infinite loop due to wrong while check. Check the value of i now to see if we need any clean up at all. Fixes: `5a7cce827e` ("erofs: refine z_erofs_{init,exit}_subsystem()") Reported-by: liujinbao1 <liujinbao1@xiaomi.com> Signed-off-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240905060027.2388893-1-dhavale@google.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-09-10 00:46:34 +08:00
Gao Xiang	59aadaa7eb	erofs: clean up erofs_register_sysfs() After commit `684b290abc` ("erofs: add support for FS_IOC_GETFSSYSFSPATH"), `sb->s_sysfs_name` is now valid. Just use it to get rid of duplicated logic. Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240828095232.571946-1-hsiangkao@linux.alibaba.com	2024-09-10 00:46:34 +08:00
Gao Xiang	9ed50b8231	erofs: fix incorrect symlink detection in fast symlink Fast symlink can be used if the on-disk symlink data is stored in the same block as the on-disk inode, so we don’t need to trigger another I/O for symlink data. However, currently fs correction could be reported _incorrectly_ if inode xattrs are too large. In fact, these should be valid images although they cannot be handled as fast symlinks. Many thanks to Colin for reporting this! Reported-by: Colin Walters <walters@verbum.org> Reported-by: https://honggfuzz.dev/ Link: https://lore.kernel.org/r/bb2dd430-7de0-47da-ae5b-82ab2dd4d945@app.fastmail.com Fixes: `431339ba90` ("staging: erofs: add inode operations") [ Note that it's a runtime misbehavior instead of a security issue. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240909031911.1174718-1-hsiangkao@linux.alibaba.com	2024-09-10 00:45:13 +08:00
Gao Xiang	0005e01e1e	erofs: fix out-of-bound access when z_erofs_gbuf_growsize() partially fails If z_erofs_gbuf_growsize() partially fails on a global buffer due to memory allocation failure or fault injection (as reported by syzbot [1]), new pages need to be freed by comparing to the existing pages to avoid memory leaks. However, the old gbuf->pages[] array may not be large enough, which can lead to null-ptr-deref or out-of-bound access. Fix this by checking against gbuf->nrpages in advance. [1] https://lore.kernel.org/r/000000000000f7b96e062018c6e3@google.com Reported-by: syzbot+242ee56aaa9585553766@syzkaller.appspotmail.com Fixes: `d6db47e571` ("erofs: do not use pagepool in z_erofs_gbuf_growsize()") Cc: <stable@vger.kernel.org> # 6.10+ Reviewed-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Sandeep Dhavale <dhavale@google.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240820085619.1375963-1-hsiangkao@linux.alibaba.com	2024-08-21 08:12:05 +08:00
Gao Xiang	e080a26725	erofs: allow large folios for compressed files As commit `2e6506e1c4` ("mm/migrate: fix deadlock in migrate_pages_batch() on large folios") has landed upstream, large folios can be safely enabled for compressed inodes since all prerequisites have already landed in 6.11-rc1. Stress tests has been running on my fleet for over 20 days without any regression. Additionally, users [1] have requested it for months. Let's allow large folios for EROFS full cases upstream now for wider testing. [1] https://lore.kernel.org/r/CAGsJ_4wtE8OcpinuqVwG4jtdx6Qh5f+TON6wz+4HMCq=A2qFcA@mail.gmail.com Cc: Barry Song <21cnbao@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> [ Gao Xiang: minor commit typo fixes. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240819025207.3808649-1-hsiangkao@linux.alibaba.com	2024-08-19 16:10:04 +08:00
Hongzhen Luo	2c534624ae	erofs: get rid of check_layout_compatibility() Simple enough to just open-code it. Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com> Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240806112208.150323-1-hongzhen@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-08-19 11:06:20 +08:00
Hongzhen Luo	5b5c96c63d	erofs: simplify readdir operation - Use i_size instead of i_size_read() due to immutable fses; - Get rid of an unneeded goto since erofs_fill_dentries() also works; - Remove unnecessary lines. Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com> Link: https://lore.kernel.org/r/20240801112622.2164029-1-hongzhen@linux.alibaba.com Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-08-19 11:06:20 +08:00
Chen Ni	14e9283fb2	erofs: convert comma to semicolon Replace a comma between expression statements by a semicolon. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://lore.kernel.org/r/20240724020721.2389738-1-nichen@iscas.ac.cn Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-07-26 18:48:12 +08:00
Gao Xiang	5d3bb77e5f	erofs: support multi-page folios for erofs_bread() If the requested page is part of the previous multi-page folio, there is no need to call read_mapping_folio() again. Also, get rid of the remaining one of page->index [1] in our codebase. [1] https://lore.kernel.org/r/Zp8fgUSIBGQ1TN0D@casper.infradead.org Cc: Matthew Wilcox <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240723073024.875290-1-hsiangkao@linux.alibaba.com	2024-07-26 18:47:57 +08:00
Huang Xiaojia	684b290abc	erofs: add support for FS_IOC_GETFSSYSFSPATH FS_IOC_GETFSSYSFSPATH ioctl exposes /sys/fs path of a given filesystem, potentially standarizing sysfs reporting. This patch add support for FS_IOC_GETFSSYSFSPATH for erofs, "erofs/<dev>" will be outputted for bdev cases, "erofs/[domain_id,]<fs_id>" will be outputted for fscache cases. Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Link: https://lore.kernel.org/r/20240720082335.441563-1-huangxiaojia2@huawei.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-07-26 18:47:46 +08:00
Gao Xiang	7dc5537c3f	erofs: fix race in z_erofs_get_gbuf() In z_erofs_get_gbuf(), the current task may be migrated to another CPU between `z_erofs_gbuf_id()` and `spin_lock(&gbuf->lock)`. Therefore, z_erofs_put_gbuf() will trigger the following issue which was found by stress test: <2>[772156.434168] kernel BUG at fs/erofs/zutil.c:58! .. <4>[772156.435007] <4>[772156.439237] CPU: 0 PID: 3078 Comm: stress Kdump: loaded Tainted: G E 6.10.0-rc7+ #2 <4>[772156.439239] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.0.0 01/01/2017 <4>[772156.439241] pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) <4>[772156.439243] pc : z_erofs_put_gbuf+0x64/0x70 [erofs] <4>[772156.439252] lr : z_erofs_lz4_decompress+0x600/0x6a0 [erofs] .. <6>[772156.445958] stress (3127): drop_caches: 1 <4>[772156.446120] Call trace: <4>[772156.446121] z_erofs_put_gbuf+0x64/0x70 [erofs] <4>[772156.446761] z_erofs_lz4_decompress+0x600/0x6a0 [erofs] <4>[772156.446897] z_erofs_decompress_queue+0x740/0xa10 [erofs] <4>[772156.447036] z_erofs_runqueue+0x428/0x8c0 [erofs] <4>[772156.447160] z_erofs_readahead+0x224/0x390 [erofs] .. Fixes: `f36f3010f6` ("erofs: rename per-CPU buffers to global buffer pool and make it configurable") Cc: <stable@vger.kernel.org> # 6.10+ Reviewed-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240722035110.3456740-1-hsiangkao@linux.alibaba.com	2024-07-26 18:47:33 +08:00
Hongbo Li	9c421ef3f6	erofs: support STATX_DIOALIGN Add support for STATX_DIOALIGN to EROFS, so that direct I/O alignment restrictions are exposed to userspace in a generic way. [Before] ``` ./statx_test /mnt/erofs/testfile statx(/mnt/erofs/testfile) = 0 dio mem align:0 dio offset align:0 ``` [After] ``` ./statx_test /mnt/erofs/testfile statx(/mnt/erofs/testfile) = 0 dio mem align:512 dio offset align:512 ``` Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240718083243.2485437-1-hsiangkao@linux.alibaba.com	2024-07-26 18:47:22 +08:00
Dan Carpenter	a3c10bed33	erofs: silence uninitialized variable warning in z_erofs_scan_folio() Smatch complains that: fs/erofs/zdata.c:1047 z_erofs_scan_folio() error: uninitialized symbol 'err'. The issue is if we hit this (!(map->m_flags & EROFS_MAP_MAPPED)) { condition then "err" isn't set. It's inside a loop so we would have to hit that condition on every iteration. Initialize "err" to zero to solve this. Fixes: `5b9654efb6` ("erofs: teach z_erofs_scan_folios() to handle multi-page folios") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/f78ab50e-ed6d-4275-8dd4-a4159fa565a2@stanley.mountain Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-07-13 12:47:34 +08:00
Gao Xiang	1001042e54	erofs: avoid refcounting short-lived pages LZ4 always reuses the decompressed buffer as its LZ77 sliding window (dynamic dictionary) for optimal performance. However, in specific cases, the output buffer may not fully contain valid page cache pages, resulting in the use of short-lived pages for temporary purposes. Due to the limited sliding window size, LZ4 shortlived bounce pages can also be reused in a sliding manner, so each bounce page can be vmapped multiple times in different relative positions by design. In order to avoiding double frees, currently, reuse counts are recorded via page refcount, but it will no longer be used as-is in the future world of Memdescs. Just maintain a lookup table to check if a shortlived page is reused. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240711053659.1364989-1-hsiangkao@linux.alibaba.com	2024-07-11 15:14:26 +08:00
Hongzhen Luo	1c076f1f4d	erofs: get rid of z_erofs_map_blocks_iter_* tracepoints Consolidate them under erofs_map_blocks_* for simplicity since we have many other ways to know if a given inode is compressed or not. Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240710083459.208362-1-hongzhen@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-07-10 18:57:06 +08:00
Gao Xiang	84a2ceefff	erofs: tidy up stream decompressors Just use a generic helper to prepare buffers for all supported stream decompressors, eliminating similar logic. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240709094106.3018109-3-hsiangkao@linux.alibaba.com	2024-07-09 19:04:41 +08:00
Gao Xiang	5a7cce827e	erofs: refine z_erofs_{init,exit}_subsystem() Introduce z_erofs_{init,exit}_decompressor() to unexport z_erofs_{deflate,lzma,zstd}_{init,exit}(). Besides, call them in z_erofs_{init,exit}_subsystem() for simplicity. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240709094106.3018109-2-hsiangkao@linux.alibaba.com	2024-07-09 19:04:40 +08:00
Gao Xiang	392d20ccef	erofs: move each decompressor to its own source file Thus *_config() function declarations can be avoided. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240709094106.3018109-1-hsiangkao@linux.alibaba.com	2024-07-09 19:04:40 +08:00
Gao Xiang	2080ca1ed3	erofs: tidy up `struct z_erofs_bvec` After revisiting the design, I believe `struct z_erofs_bvec` should be page-based instead of folio-based due to the reasons below: - The minimized memory mapping block is a page; - Under the certain circumstances, only temporary pages needs to be used instead of folios since refcount, mapcount for such pages are unnecessary; - Decompressors handle all types of pages including temporary pages, not only folios. When handling `struct z_erofs_bvec`, all folio-related information is now accessed using the page_folio() helper. The final goal of this round adaptation is to eliminate direct accesses to `struct page` in the EROFS codebase, except for some exceptions like `z_erofs_is_shortlived_page()` and `z_erofs_page_is_invalidated()`, which require a new helper to determine the memdesc type of an arbitrary page. Actually large folios of compressed files seem to work now, yet I tend to conduct more tests before officially enabling this for all scenarios. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240703120051.3653452-4-hsiangkao@linux.alibaba.com	2024-07-08 22:09:42 +08:00
Gao Xiang	5b9654efb6	erofs: teach z_erofs_scan_folios() to handle multi-page folios Previously, a folio just contains one page. In order to enable large folios, z_erofs_scan_folios() needs to handle multi-page folios. First, this patch eliminates all gotos. Instead, the new loop deal with multiple parts in each folio. It's simple to handle the parts which belong to unmapped extents or fragment extents; but for encoded extents, the page boundaries needs to be considered for `tight` and `split` to keep inplace I/Os work correctly: when a part crosses the page boundary, they needs to be reseted properly. Besides, simplify `tight` derivation since Z_EROFS_PCLUSTER_HOOKED has been removed for quite a while. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240703120051.3653452-3-hsiangkao@linux.alibaba.com	2024-07-08 22:09:42 +08:00
Gao Xiang	90cd33d793	erofs: convert z_erofs_read_fragment() to folios Just a straight-forward conversion. No logic changes. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240703120051.3653452-2-hsiangkao@linux.alibaba.com	2024-07-08 22:09:42 +08:00
Gao Xiang	1a4821a0a0	erofs: convert z_erofs_pcluster_readmore() to folios Unlike `pagecache_get_page()`, `__filemap_get_folio()` returns error pointers instead of NULL, thus switching to `IS_ERR_OR_NULL`. Apart from that, it's just a straightforward conversion. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240703120051.3653452-1-hsiangkao@linux.alibaba.com	2024-07-08 22:09:41 +08:00
Gao Xiang	9b32b063be	erofs: ensure m_llen is reset to 0 if metadata is invalid Sometimes, the on-disk metadata might be invalid due to user interrupts, storage failures, or other unknown causes. In that case, z_erofs_map_blocks_iter() may still return a valid m_llen while other fields remain invalid (e.g., m_plen can be 0). Due to the return value of z_erofs_scan_folio() in some path will be ignored on purpose, the following z_erofs_scan_folio() could then use the invalid value by accident. Let's reset m_llen to 0 to prevent this. Link: https://lore.kernel.org/r/20240629185743.2819229-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-06-30 10:54:28 +08:00
Huang Xiaojia	cc69a681b2	erofs: convert to use super_set_uuid to support for FS_IOC_GETFSUUID FS_IOC_GETFSUUID ioctl exposes the uuid of a filesystem. To support the ioctl, init sb->s_uuid with super_set_uuid(). Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240624063704.2476070-1-huangxiaojia2@huawei.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-06-26 17:02:28 +08:00
Sandeep Dhavale	9d01f6f6d8	erofs: fix possible memory leak in z_erofs_gbuf_exit() Because we incorrectly reused of variable `i` in `z_erofs_gbuf_exit()` for inner loop, we may exit early from outer loop resulting in memory leak. Fix this by using separate variable for iterating through inner loop. Fixes: `f36f3010f6` ("erofs: rename per-CPU buffers to global buffer pool and make it configurable") Signed-off-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240624220206.3373197-1-dhavale@google.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-06-26 17:02:15 +08:00
Linus Torvalds	dcb9f48667	Changes since last update: - Convert metadata APIs to byte offsets; - Avoid allocating DEFLATE streams unnecessarily; - Some erofs_show_options() cleanup. -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmZQmHARHHhpYW5nQGtl cm5lbC5vcmcACgkQUXZn5Zlu5qrGnhAAnvOifMYekIgY/W0PSGSe85XtXps5vBjo rixZ/vNAl8NrLgzHY5lX+4dbENywEULzdxYAgF4VN9eKNGyuZ4oCBmYStoGueQ41 N1oq36O/CVJDCOLkFUwjD6GpHngjJR3xiU8DRrhKdPZJeYXVEJwZB4KOOymorkO0 Xn9SPrF/GC4YDWJL901RKT8p6gyRNWiWJ/+hwDAxfmCSuzW2uRNnBLeXNvjqj4Z3 u5WEaFSlNRlLWnZPcHy8O3t/XAPkhvTN+C5+YeaePWyHc5WYOM9mWt8VLOFQb60K l+q/cnWXw+8NNbxnuccWVJfEb6zUJmZ5/yTm+Ndutrpk5dFSPb6DjZo5/K36dGls r02XysW+Jl24wBIFkYRHild2WT+gSqo/zyIDsSt/DF+DhpqmnIqAASx4yJenw7ib BNV4m4gQflLrORKpVmsKyHrm5GuHsTWsGc51iX1uqsdfDgN79mFgR1taBAZw162P pPeWuD6XYE+eT+t5nggnXqmZ5qatEhTFkYDjUzSq4ZQfyZnRG8Tl6zbBuyVhaxsO zH1rAmwtI6x+ehHI46Kurh8HT6UrB0CNM6RokYKr6JWVzIdFPPMVKkxcq2KozTPf CBu+Whh/WGFROM8JT2KGCnuz2ZBUZXDtNBJmW+ZnA+z9b7xZ1f31nio4vKKdZU+R swpnV+0q9cs= =qDDl -----END PGP SIGNATURE----- Merge tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull more erofs updates from Gao Xiang: "The main ones are metadata API conversion to byte offsets by Al Viro. Another patch gets rid of unnecessary memory allocation out of DEFLATE decompressor. The remaining one is a trivial cleanup. - Convert metadata APIs to byte offsets - Avoid allocating DEFLATE streams unnecessarily - Some erofs_show_options() cleanup" * tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: avoid allocating DEFLATE streams before mounting z_erofs_pcluster_begin(): don't bother with rounding position down erofs: don't round offset down for erofs_read_metabuf() erofs: don't align offset for erofs_read_metabuf() (simple cases) erofs: mechanically convert erofs_read_metabuf() to offsets erofs: clean up erofs_show_options()	2024-05-24 09:31:50 -07:00
Linus Torvalds	38da32ee70	bd_inode series Replacement of bdev->bd_inode with sane(r) set of primitives. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkwjlgAKCRBZ7Krx/gZQ 66OmAP9nhZLASn/iM2+979I6O0GW+vid+uLh48uW3d+LbsmVIgD9GYpR+cuLQ/xj mJESWfYKOVSpFFSrqlzKg9PQlU/GFgs= =6LRp -----END PGP SIGNATURE----- Merge tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull bdev bd_inode updates from Al Viro: "Replacement of bdev->bd_inode with sane(r) set of primitives by me and Yu Kuai" * tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RIP ->bd_inode dasd_format(): killing the last remaining user of ->bd_inode nilfs_attach_log_writer(): use ->bd_mapping->host instead of ->bd_inode block/bdev.c: use the knowledge of inode/bdev coallocation gfs2: more obvious initializations of mapping->host fs/buffer.c: massage the remaining users of ->bd_inode to ->bd_mapping blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here... grow_dev_folio(): we only want ->bd_inode->i_mapping there use ->bd_mapping instead of ->bd_inode->i_mapping block_device: add a pointer to struct address_space (page cache of bdev) missing helpers: bdev_unhash(), bdev_drop() block: move two helpers into bdev.c block2mtd: prevent direct access of bd_inode dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode) blkdev_write_iter(): saner way to get inode and bdev bcachefs: remove dead function bdev_sectors() ext4: remove block_device_ejected() erofs_buf: store address_space instead of inode erofs: switch erofs_bread() to passing offset instead of block number	2024-05-21 09:51:42 -07:00
Gao Xiang	80eb4f6205	erofs: avoid allocating DEFLATE streams before mounting Currently, each DEFLATE stream takes one 32 KiB permanent internal window buffer even if there is no running instance which uses DEFLATE algorithm. It's unexpected and wasteful on embedded devices with limited resources and servers with hundreds of CPU cores if DEFLATE is enabled but unused. Fixes: `ffa09b3bd0` ("erofs: DEFLATE compression support") Cc: <stable@vger.kernel.org> # 6.6+ Reviewed-by: Sandeep Dhavale <dhavale@google.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240520090106.2898681-1-hsiangkao@linux.alibaba.com	2024-05-21 03:07:39 +08:00
Al Viro	5587a8172e	z_erofs_pcluster_begin(): don't bother with rounding position down ... and be more idiomatic when calculating ->pageofs_in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20240425200017.GF1031757@ZenIV [ Gao Xiang: don't use `offset_in_page(mptr)` due to EROFS_NO_KMAP. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-05-18 01:53:04 +08:00
Al Viro	4afe6b8d21	erofs: don't round offset down for erofs_read_metabuf() There's only one place where struct z_erofs_maprecorder ->kaddr is used not in the same function that has assigned it - the value read in unpack_compacted_index() gets calculated in z_erofs_load_compact_lcluster(). With minor massage we can switch to storing it with offset in block already added. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20240425195944.GE1031757@ZenIV Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-05-18 01:52:48 +08:00
Al Viro	076d965eb8	erofs: don't align offset for erofs_read_metabuf() (simple cases) Most of the callers of erofs_read_metabuf() have the following form: block = erofs_blknr(sb, offset); off = erofs_blkoff(sb, offset); p = erofs_read_metabuf(...., erofs_pos(sb, block), ...); if (IS_ERR(p)) return PTR_ERR(p); q = p + off; // no further uses of p, block or off. The value passed to erofs_read_metabuf() is offset rounded down to block size, i.e. offset - off. Passing offset as-is would increase the return value by off in case of success and keep the return value unchanged in in case of error. In other words, the same could be achieved by q = erofs_read_metabuf(...., offset, ...); if (IS_ERR(q)) return PTR_ERR(q); This commit convert these simple cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20240425195915.GD1031757@ZenIV Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-05-18 01:47:26 +08:00
Al Viro	e09815446d	erofs: mechanically convert erofs_read_metabuf() to offsets just lift the call of erofs_pos() into the callers; it will collapse in most of them, but that's better done caller-by-caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20240425195846.GC1031757@ZenIV Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-05-18 01:46:18 +08:00

1 2 3 4 5 ...

533 Commits