linux

Author	SHA1	Message	Date
Hongnan Li	ecce9212d0	erofs: update ctx->pos for every emitted dirent erofs_readdir update ctx->pos after filling a batch of dentries and it may cause dir/files duplication for NFS readdirplus which depends on ctx->pos to fill dir correctly. So update ctx->pos for every emitted dirent in erofs_fill_dentries to fix it. Also fix the update of ctx->pos when the initial file position has exceeded nameoff. Fixes: `3e917cc305` ("erofs: make filesystem exportable") Signed-off-by: Hongnan Li <hongnan.li@linux.alibaba.com> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220722082732.30935-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-07-31 22:26:29 +08:00
Gao Xiang	cc2a171372	erofs: get rid of the leftover PAGE_SIZE in dir.c Convert the last hardcoded PAGE_SIZEs of uncompressed cases. Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220619150940.121005-1-hsiangkao@linux.alibaba.com	2022-07-22 21:46:26 +08:00
Gao Xiang	de8a801ab6	erofs: get rid of erofs_prepare_dio() helper Fold in erofs_prepare_dio() in order to simplify the code. Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220720082229.12172-1-hsiangkao@linux.alibaba.com	2022-07-22 21:46:03 +08:00
Gao Xiang	267f2492c8	erofs: introduce multi-reference pclusters (fully-referenced) Let's introduce multi-reference pclusters at runtime. In details, if one pcluster is requested by multiple extents at almost the same time (even belong to different files), the longest extent will be decompressed as representative and the other extents are actually copied from the longest one in one round. After this patch, fully-referenced extents can be correctly handled and the full decoding check needs to be bypassed for partial-referenced extents. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-17-hsiangkao@linux.alibaba.com	2022-07-22 21:44:27 +08:00
Gao Xiang	2bfab9c0ed	erofs: record the longest decompressed size in this round Currently, `pcl->length' records the longest decompressed length as long as the pcluster itself isn't reclaimed. However, such number is unneeded for the general cases since it doesn't indicate the exact decompressed size in this round. Instead, let's record the decompressed size for this round instead, thus `pcl->nr_pages' can be completely dropped and pageofs_out is also designed to be kept in sync with `pcl->length'. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-16-hsiangkao@linux.alibaba.com	2022-07-21 22:55:44 +08:00
Gao Xiang	3fe96ee0f9	erofs: introduce z_erofs_do_decompressed_bvec() Both out_bvecs and in_bvecs share the common logic for decompressed buffers. So let's make a helper for this. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-15-hsiangkao@linux.alibaba.com	2022-07-21 22:55:37 +08:00
Gao Xiang	fe3e5914e6	erofs: try to leave (de)compressed_pages on stack if possible For the most cases, small pclusters can be decompressed with page arrays on stack. Try to leave both (de)compressed_pages on stack if possible as before. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-14-hsiangkao@linux.alibaba.com	2022-07-21 22:55:30 +08:00
Gao Xiang	4f05687fd7	erofs: introduce struct z_erofs_decompress_backend Let's introduce struct z_erofs_decompress_backend in order to pass on the decompression backend context between helper functions more easier and avoid too many arguments. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-13-hsiangkao@linux.alibaba.com	2022-07-21 22:55:22 +08:00
Gao Xiang	e73681877d	erofs: get rid of `z_pagemap_global' In order to introduce multi-reference pclusters for compressed data deduplication, let's get rid of the global page array for now since it needs to be re-designed then at least. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-12-hsiangkao@linux.alibaba.com	2022-07-21 22:55:15 +08:00
Gao Xiang	db166fc202	erofs: clean up `enum z_erofs_collectmode' `enum z_erofs_collectmode' is really ambiguous, but I'm not quite sure if there are better naming, basically it's used to judge whether inplace I/O can be used due to the current status of pclusters in the chain. Rename it as `enum z_erofs_pclustermode' instead. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-11-hsiangkao@linux.alibaba.com	2022-07-21 22:55:07 +08:00
Gao Xiang	5b220b204c	erofs: get rid of `enum z_erofs_page_type' Remove it since pagevec[] is no longer used. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-10-hsiangkao@linux.alibaba.com	2022-07-21 22:54:54 +08:00
Gao Xiang	671485516e	erofs: rework online page handling Since all decompressed offsets have been integrated to bvecs[], this patch avoids all sub-indexes so that page->private only includes a part count and an eio flag, thus in the future folio->private can have the same meaning. In addition, PG_error will not be used anymore after this patch and we're heading to use page->private (later folio->private) and page->mapping (later folio->mapping) only. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-9-hsiangkao@linux.alibaba.com	2022-07-21 22:54:46 +08:00
Gao Xiang	ed722fbcca	erofs: switch compressed_pages[] to bufvec Convert compressed_pages[] to bufvec in order to avoid using page->private to keep onlinepage_index (decompressed offset) for inplace I/O pages. In the future, we only rely on folio->private to keep a countdown to unlock folios and set folio_uptodate. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-8-hsiangkao@linux.alibaba.com	2022-07-21 22:54:37 +08:00
Gao Xiang	67139e36d9	erofs: introduce `z_erofs_parse_in_bvecs' `z_erofs_decompress_pcluster()' is too long therefore it'd be better to introduce another helper to parse compressed pages (or laterly, compressed bvecs.) BTW, since `compressed_bvecs' is too long as a part of the function name, `in_bvecs' is used here instead. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-7-hsiangkao@linux.alibaba.com	2022-07-21 22:54:29 +08:00
Gao Xiang	387bab8716	erofs: drop the old pagevec approach Remove the old pagevec approach but keep z_erofs_page_type for now. It will be reworked in the following commits as well. Also rename Z_EROFS_NR_INLINE_PAGEVECS as Z_EROFS_INLINE_BVECS with the new value 2 since it's actually enough to bootstrap. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-6-hsiangkao@linux.alibaba.com	2022-07-21 22:54:20 +08:00
Gao Xiang	06a304cd9c	erofs: introduce bufvec to store decompressed buffers For each pcluster, the total compressed buffers are determined in advance, yet the number of decompressed buffers actually vary. Too many decompressed pages can be recorded if one pcluster is highly compressed or its pcluster size is large. That takes extra memory footprints compared to uncompressed filesystems, especially a lot of I/O in flight on low-ended devices. Therefore, similar to inplace I/O, pagevec was introduced to reuse page cache to store these pointers in the time-sharing way since these pages are actually unused before decompressing. In order to make it more flexable, a cleaner bufvec is used to replace the old pagevec stuffs so that - Decompressed offsets can be stored inline, thus it can be used for the upcoming feature like compressed data deduplication. It's calculated by `page_offset(page) - map->m_la'; - Towards supporting large folios for compressed inodes since our final goal is to completely avoid page->private but use folio->private only for all page cache pages. Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-5-hsiangkao@linux.alibaba.com	2022-07-21 22:54:10 +08:00
Gao Xiang	42fec235f1	erofs: introduce `z_erofs_parse_out_bvecs()' `z_erofs_decompress_pcluster()' is too long therefore it'd be better to introduce another helper to parse decompressed pages (or laterly, decompressed bvecs.) BTW, since `decompressed_bvecs' is too long as a part of the function name, `out_bvecs' is used instead. Reviewed-by: Yue Hu <huyue2@coolpad.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-4-hsiangkao@linux.alibaba.com	2022-07-21 22:53:53 +08:00
Gao Xiang	0d823b424f	erofs: clean up z_erofs_collector_begin() Rearrange the code and get rid of all gotos. Reviewed-by: Yue Hu <huyue2@coolpad.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-3-hsiangkao@linux.alibaba.com	2022-07-21 22:53:43 +08:00
Gao Xiang	83a386c0a5	erofs: get rid of unneeded `inode',` map' and `sb' Since commit `5c6dcc57e2` ("erofs: get rid of `struct z_erofs_collector'"), these arguments can be dropped as well. No logic changes. Reviewed-by: Yue Hu <huyue2@coolpad.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220715154203.48093-2-hsiangkao@linux.alibaba.com	2022-07-21 22:53:33 +08:00
Gao Xiang	448b5a1548	erofs: avoid consecutive detection for Highmem memory Currently, vmap()s are avoided if physical addresses are consecutive for decompressed buffers. I observed that is very common for 4KiB pclusters since the numbers of decompressed pages are almost 2 or 3. However, such detection doesn't work for Highmem pages on 32-bit machines, let's fix it now. Reported-by: Liu Jinbao <liujinbao1@xiaomi.com> Fixes: `7fc45dbc93` ("staging: erofs: introduce generic decompression backend") Link: https://lore.kernel.org/r/20220708101001.21242-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-07-09 06:35:09 +08:00
Yuwen Chen	2df7c4bd7c	erofs: wake up all waiters after z_erofs_lzma_head ready When the user mounts the erofs second times, the decompression thread may hung. The problem happens due to a sequence of steps like the following: 1) Task A called z_erofs_load_lzma_config which obtain all of the node from the z_erofs_lzma_head. 2) At this time, task B called the z_erofs_lzma_decompress and wanted to get a node. But the z_erofs_lzma_head was empty, the Task B had to sleep. 3) Task A release nodes and push nodes into the z_erofs_lzma_head. But task B was still sleeping. One example report when the hung happens: task:kworker/u3:1 state:D stack:14384 pid: 86 ppid: 2 flags:0x00004000 Workqueue: erofs_unzipd z_erofs_decompressqueue_work Call Trace: <TASK> __schedule+0x281/0x760 schedule+0x49/0xb0 z_erofs_lzma_decompress+0x4bc/0x580 ? cpu_core_flags+0x10/0x10 z_erofs_decompress_pcluster+0x49b/0xba0 ? __update_load_avg_se+0x2b0/0x330 ? __update_load_avg_se+0x2b0/0x330 ? update_load_avg+0x5f/0x690 ? update_load_avg+0x5f/0x690 ? set_next_entity+0xbd/0x110 ? _raw_spin_unlock+0xd/0x20 z_erofs_decompress_queue.isra.0+0x2e/0x50 z_erofs_decompressqueue_work+0x30/0x60 process_one_work+0x1d3/0x3a0 worker_thread+0x45/0x3a0 ? process_one_work+0x3a0/0x3a0 kthread+0xe2/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Signed-off-by: Yuwen Chen <chenyuwen1@meizu.com> Fixes: `622ceaddb7` ("erofs: lzma compression support") Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220626224041.4288-1-chenyuwen1@meizu.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-07-09 06:32:29 +08:00
Linus Torvalds	8171acb8bc	Merge tag 'erofs-for-5.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull more erofs updates from Gao Xiang: "This is a follow-up to the main updates, including some fixes of fscache mode related to compressed inodes and a cachefiles tracepoint. There is also a patch to fix an unexpected decompression strategy change due to a cleanup in the past. All the fixes are quite small. Apart from these, documentation is also updated for a better description of recent new features. In addition, this has some trivial cleanups without actual code logic changes, so I could have a more recent codebase to work on folios and avoiding the PG_error page flag for the next cycle. Summary: - Leave compressed inodes unsupported in fscache mode for now - Avoid crash when using tracepoint cachefiles_prep_read - Fix `backmost' behavior due to a recent cleanup - Update documentation for better description of recent new features - Several decompression cleanups w/o logical change" * tag 'erofs-for-5.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix 'backmost' member of z_erofs_decompress_frontend erofs: simplify z_erofs_pcluster_readmore() erofs: get rid of label `restart_now' erofs: get rid of `struct z_erofs_collection' erofs: update documentation erofs: fix crash when enable tracepoint cachefiles_prep_read erofs: leave compressed inodes unsupported in fscache mode for now	2022-06-01 11:54:29 -07:00
Weizhao Ouyang	4398d3c31b	erofs: fix 'backmost' member of z_erofs_decompress_frontend Initialize 'backmost' to true in DECOMPRESS_FRONTEND_INIT. Fixes: `5c6dcc57e2` ("erofs: get rid of `struct z_erofs_collector'") Signed-off-by: Weizhao Ouyang <o451686892@gmail.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220530075114.918874-1-o451686892@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-31 23:15:30 +08:00
Gao Xiang	aa793b46bb	erofs: simplify z_erofs_pcluster_readmore() Get rid of unnecessary label `skip'. No logic changes. Link: https://lore.kernel.org/r/20220529055425.226363-4-xiang@kernel.org Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-31 23:15:21 +08:00
Gao Xiang	39397a46cf	erofs: get rid of label `restart_now' Simplify this part of code. No logic changes. Link: https://lore.kernel.org/r/20220529055425.226363-3-xiang@kernel.org Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-31 23:14:58 +08:00
Gao Xiang	87ca34a706	erofs: get rid of `struct z_erofs_collection' It was incompletely introduced for deduplication between different logical extents backed with the same pcluster. We will have a better in-memory representation in the next release cycle for this, as well as partial memory folios support. So get rid of it instead. No logic changes. Link: https://lore.kernel.org/r/20220529055425.226363-2-xiang@kernel.org Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-31 23:13:59 +08:00
Xin Yin	b5cb79dcfd	erofs: fix crash when enable tracepoint cachefiles_prep_read RIP: 0010:trace_event_raw_event_cachefiles_prep_read+0x88/0xe0 [cachefiles] Call Trace: <TASK> cachefiles_prepare_read+0x1d7/0x3a0 [cachefiles] erofs_fscache_read_folios+0x188/0x220 [erofs] erofs_fscache_meta_readpage+0x106/0x160 [erofs] do_read_cache_folio+0x42a/0x590 ? bdi_register_va.part.14+0x1a7/0x210 ? super_setup_bdi_name+0x76/0xe0 erofs_bread+0x5b/0x170 [erofs] erofs_fc_fill_super+0x12b/0xc50 [erofs] This tracepoint uses rreq->inode, should set it when allocating. Fixes: `d435d53228` ("erofs: change to use asynchronous io for fscache readpage/readahead") Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220527101800.22360-1-yinxin.x@bytedance.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-29 15:36:04 +08:00
Jeffle Xu	0130e4e8e4	erofs: leave compressed inodes unsupported in fscache mode for now erofs over fscache doesn't support the compressed layout yet. It will cause NULL crash if there are compressed inodes contained when working in fscache mode. So far in the erofs based container image distribution scenarios (RAFS v6), the compressed RAFS v6 images are downloaded and then decompressed on demand as an uncompressed erofs image. Then the erofs image is mounted in fscache mode for containers to use. IOWs, currently compressed data is decompressed on the userspace side instead and uncompressed erofs images will be finally cached. The fscache support for the compressed layout is still under development and it will be used for runtime decompression feature. Anyway, to avoid the potential crash, let's leave the compressed inodes unsupported in fscache mode until we support it later. Fixes: `1442b02b66` ("erofs: implement fscache-based data read for non-inline layout") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220526010344.118493-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-29 15:34:54 +08:00
Linus Torvalds	fdaf9a5840	Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache Pull page cache updates from Matthew Wilcox: - Appoint myself page cache maintainer - Fix how scsicam uses the page cache - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS - Remove the AOP flags entirely - Remove pagecache_write_begin() and pagecache_write_end() - Documentation updates - Convert several address_space operations to use folios: - is_dirty_writeback - readpage becomes read_folio - releasepage becomes release_folio - freepage becomes free_folio - Change filler_t to require a struct file pointer be the first argument like ->read_folio * tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits) nilfs2: Fix some kernel-doc comments Appoint myself page cache maintainer fs: Remove aops->freepage secretmem: Convert to free_folio nfs: Convert to free_folio orangefs: Convert to free_folio fs: Add free_folio address space operation fs: Convert drop_buffers() to use a folio fs: Change try_to_free_buffers() to take a folio jbd2: Convert release_buffer_page() to use a folio jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio reiserfs: Convert release_buffer_page() to use a folio fs: Remove last vestiges of releasepage ubifs: Convert to release_folio reiserfs: Convert to release_folio orangefs: Convert to release_folio ocfs2: Convert to release_folio nilfs2: Remove comment about releasepage nfs: Convert to release_folio jfs: Convert to release_folio ...	2022-05-24 19:55:07 -07:00
Linus Torvalds	bd1b7c1384	Merge tag 'for-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Features: - subpage: - support for PAGE_SIZE > 4K (previously only 64K) - make it work with raid56 - repair super block num_devices automatically if it does not match the number of device items - defrag can convert inline extents to regular extents, up to now inline files were skipped but the setting of mount option max_inline could affect the decision logic - zoned: - minimal accepted zone size is explicitly set to 4MiB - make zone reclaim less aggressive and don't reclaim if there are enough free zones - add per-profile sysfs tunable of the reclaim threshold - allow automatic block group reclaim for non-zoned filesystems, with sysfs tunables - tree-checker: new check, compare extent buffer owner against owner rootid Performance: - avoid blocking on space reservation when doing nowait direct io writes (+7% throughput for reads and writes) - NOCOW write throughput improvement due to refined locking (+3%) - send: reduce pressure to page cache by dropping extent pages right after they're processed Core: - convert all radix trees to xarray - add iterators for b-tree node items - support printk message index - user bulk page allocation for extent buffers - switch to bio_alloc API, use on-stack bios where convenient, other bio cleanups - use rw lock for block groups to favor concurrent reads - simplify workques, don't allocate high priority threads for all normal queues as we need only one - refactor scrub, process chunks based on their constraints and similarity - allocate direct io structures on stack and pass around only pointers, avoids allocation and reduces potential error handling Fixes: - fix count of reserved transaction items for various inode operations - fix deadlock between concurrent dio writes when low on free data space - fix a few cases when zones need to be finished VFS, iomap: - add helper to check if sb write has started (usable for assertions) - new helper iomap_dio_alloc_bio, export iomap_dio_bio_end_io" * tag 'for-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (173 commits) btrfs: zoned: introduce a minimal zone size 4M and reject mount btrfs: allow defrag to convert inline extents to regular extents btrfs: add "0x" prefix for unsupported optional features btrfs: do not account twice for inode ref when reserving metadata units btrfs: zoned: fix comparison of alloc_offset vs meta_write_pointer btrfs: send: avoid trashing the page cache btrfs: send: keep the current inode open while processing it btrfs: allocate the btrfs_dio_private as part of the iomap dio bio btrfs: move struct btrfs_dio_private to inode.c btrfs: remove the disk_bytenr in struct btrfs_dio_private btrfs: allocate dio_data on stack iomap: add per-iomap_iter private data iomap: allow the file system to provide a bio_set for direct I/O btrfs: add a btrfs_dio_rw wrapper btrfs: zoned: zone finish unused block group btrfs: zoned: properly finish block group on metadata write btrfs: zoned: finish block group when there are no more allocatable bytes left btrfs: zoned: consolidate zone finish functions btrfs: zoned: introduce btrfs_zoned_bg_is_full btrfs: improve error reporting in lookup_inline_extent_backref ...	2022-05-24 18:52:35 -07:00
Jeffle Xu	ba73eadd23	erofs: scan devices from device table When "-o device" mount option is not specified, scan the device table and instantiate the devices if there's any in the device table. In this case, the tag field of each device slot uniquely specifies a device. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220512055601.106109-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:21 +08:00
Xin Yin	d435d53228	erofs: change to use asynchronous io for fscache readpage/readahead Use asynchronous io to read data from fscache may greatly improve IO bandwidth for sequential buffered read scenario. Change erofs_fscache_read_folios to erofs_fscache_read_folios_async, and read data from fscache asynchronously. Make .readpage()/.readahead() to use this new helper. Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/r/20220509074028.74954-23-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> [ Gao Xiang: minor styling changes. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:21 +08:00
Jeffle Xu	9c0cc9c729	erofs: add 'fsid' mount option Introduce 'fsid' mount option to enable on-demand read sementics, in which case, erofs will be mounted from data blobs. Users could specify the name of primary data blob by this mount option. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-22-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Tested-by: Zichen Tian <tianzichen@kuaishou.com> Tested-by: Jia Zhu <zhujia.zj@bytedance.com> Tested-by: Yan Song <yansong.ys@antgroup.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:21 +08:00
Jeffle Xu	c665b394b9	erofs: implement fscache-based data readahead Implement fscache-based data readahead. Also registers an individual bdi for each erofs instance to enable readahead. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-21-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:21 +08:00
Jeffle Xu	bd735bdaa6	erofs: implement fscache-based data read for inline layout Implement the data plane of reading data from data blobs over fscache for inline layout. For the heading non-inline part, the data plane for non-inline layout is reused, while only the tail packing part needs special handling. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-20-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:20 +08:00
Jeffle Xu	1442b02b66	erofs: implement fscache-based data read for non-inline layout Implement the data plane of reading data from data blobs over fscache for non-inline layout. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-19-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:20 +08:00
Jeffle Xu	5375e7c8b0	erofs: implement fscache-based metadata read Implement the data plane of reading metadata from primary data blob over fscache. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-18-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:20 +08:00
Jeffle Xu	955b478e1b	erofs: register fscache context for extra data blobs Similar to the multi-device mode, erofs could be mounted from one primary data blob (mandatory) and multiple extra data blobs (optional). Register fscache context for each extra data blob. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-17-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:20 +08:00
Jeffle Xu	37c90c5fae	erofs: register fscache context for primary data blob Registers fscache context for primary data blob. Also move the initialization of s_op and related fields forward, since anonymous inode will be allocated under the super block when registering the fscache context. Something worth mentioning about the cleanup routine. 1. The fscache context will instantiate anonymous inodes under the super block. Release these anonymous inodes when .put_super() is called, or we'll get "VFS: Busy inodes after unmount." warning. 2. The fscache context is initialized prior to the root inode. If .kill_sb() is called when mount failed, .put_super() won't be called when root inode has not been initialized yet. Thus .kill_sb() shall also contain the cleanup routine. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-16-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:20 +08:00
Jeffle Xu	ec00b5e29c	erofs: add erofs_fscache_read_folios() helper Add erofs_fscache_read_folios() helper reading from fscache. It supports on-demand read semantics. That is, it will make the backend prepare for the data when cache miss. Once data ready, it will read from the cache. This helper can then be used to implement .readpage()/.readahead() of on-demand read semantics. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-15-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:19 +08:00
Jeffle Xu	3c265d7dce	erofs: add anonymous inode caching metadata for data blobs Introduce one anonymous inode for data blobs so that erofs can cache metadata directly within such anonymous inode. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-14-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:19 +08:00
Jeffle Xu	b02c602f06	erofs: add fscache context helper functions Introduce a context structure for managing data blobs, and helper functions for initializing and cleaning up this context structure. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-13-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:19 +08:00
Jeffle Xu	c6be2bd0a5	erofs: register fscache volume A new fscache based mode is going to be introduced for erofs, in which case on-demand read semantics is implemented through fscache. As the first step, register fscache volume for each erofs filesystem. That means, data blobs can not be shared among erofs filesystems. In the following iteration, we are going to introduce the domain semantics, in which case several erofs filesystems can belong to one domain, and data blobs can be shared among these erofs filesystems of one domain. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-12-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:19 +08:00
Jeffle Xu	93b856bb5f	erofs: add fscache mode check helper Until then erofs is exactly blockdev based filesystem. A new fscache-based mode is going to be introduced for erofs to support scenarios where on-demand read semantics is needed, e.g. container image distribution. In this case, erofs could be mounted from data blobs through fscache. Add a helper checking which mode erofs works in, and twist the code in preparation for the upcoming fscache mode. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-11-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:19 +08:00
Jeffle Xu	94d7894670	erofs: make erofs_map_blocks() generally available ... so that it can be used in the following introduced fscache mode. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20220425122143.56815-10-jefflexu@linux.alibaba.com Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-18 00:11:18 +08:00
Chao Yu	6c459b78d4	erofs: support idmapped mounts This patch enables idmapped mounts for erofs, since all dedicated helpers for this functionality existsm, so, in this patch we just pass down the user_namespace argument from the VFS methods to the relevant helpers. Simple idmap example on erofs image: 1. mkdir dir 2. touch dir/file 3. mkfs.erofs erofs.img dir 4. mount -t erofs -o loop erofs.img /mnt/erofs/ 5. ls -ln /mnt/erofs/ total 0 -rw-rw-r-- 1 1000 1000 0 May 17 15:26 file 6. mount-idmapped --map-mount b:1000:1001:1 /mnt/erofs/ /mnt/scratch_erofs/ 7. ls -ln /mnt/scratch_erofs/ total 0 -rw-rw-r-- 1 1001 1001 0 May 17 15:26 file Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Chao Yu <chao.yu@oppo.com> Link: https://lore.kernel.org/r/20220517104103.3570721-1-chao@kernel.org Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-17 23:56:20 +08:00
Hongnan Li	3e917cc305	erofs: make filesystem exportable Implement export operations in order to make EROFS support accessing inodes with filehandles so that it can be exported via NFS and used by overlayfs. Without this patch, 'exportfs -rv' will report: exportfs: /root/erofs_mp does not support NFS export Also tested with unionmount-testsuite and the testcase below passes now: ./run --ov --erofs --verify hard-link For more details about the testcase, see: https://github.com/amir73il/unionmount-testsuite/pull/6 Signed-off-by: Hongnan Li <hongnan.li@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220425040712.91685-1-hongnan.li@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-17 23:48:54 +08:00
Gao Xiang	dcbe6803ff	erofs: fix buffer copy overflow of ztailpacking feature I got some KASAN report as below: [ 46.959738] ================================================================== [ 46.960430] BUG: KASAN: use-after-free in z_erofs_shifted_transform+0x2bd/0x370 [ 46.960430] Read of size 4074 at addr ffff8880300c2f8e by task fssum/188 ... [ 46.960430] Call Trace: [ 46.960430] <TASK> [ 46.960430] dump_stack_lvl+0x41/0x5e [ 46.960430] print_report.cold+0xb2/0x6b7 [ 46.960430] ? z_erofs_shifted_transform+0x2bd/0x370 [ 46.960430] kasan_report+0x8a/0x140 [ 46.960430] ? z_erofs_shifted_transform+0x2bd/0x370 [ 46.960430] kasan_check_range+0x14d/0x1d0 [ 46.960430] memcpy+0x20/0x60 [ 46.960430] z_erofs_shifted_transform+0x2bd/0x370 [ 46.960430] z_erofs_decompress_pcluster+0xaae/0x1080 The root cause is that the tail pcluster won't be a complete filesystem block anymore. So if ztailpacking is used, the second part of an uncompressed tail pcluster may not be ``rq->pageofs_out``. Fixes: `ab749badf9` ("erofs: support unaligned data decompression") Fixes: `cecf864d3d` ("erofs: support inline data decompression") Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220512115833.24175-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-17 23:38:14 +08:00
Gao Xiang	2833f4bb46	erofs: refine on-disk definition comments Fix some outdated comments and typos, hopefully helpful. Link: https://lore.kernel.org/r/20220506194612.117120-3-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-17 23:38:13 +08:00
Gao Xiang	1f7aa6caef	erofs: remove obsoleted comments Some comments haven't been useful anymore since the code updated. Let's drop them instead. Link: https://lore.kernel.org/r/20220506194612.117120-2-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-05-17 23:38:13 +08:00

1 2 3 4 5

248 Commits