linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-26 14:12:06 +00:00

Author	SHA1	Message	Date
Filipe Manana	7dc66abb5a	btrfs: use a dedicated data structure for chunk maps Currently we abuse the extent_map structure for two purposes: 1) To actually represent extents for inodes; 2) To represent chunk mappings. This is odd and has several disadvantages: 1) To create a chunk map, we need to do two memory allocations: one for an extent_map structure and another one for a map_lookup structure, so more potential for an allocation failure and more complicated code to manage and link two structures; 2) For a chunk map we actually only use 3 fields (24 bytes) of the respective extent map structure: the 'start' field to have the logical start address of the chunk, the 'len' field to have the chunk's size, and the 'orig_block_len' field to contain the chunk's stripe size. Besides wasting a memory, it's also odd and not intuitive at all to have the stripe size in a field named 'orig_block_len'. We are also using 'block_len' of the extent_map structure to contain the chunk size, so we have 2 fields for the same value, 'len' and 'block_len', which is pointless; 3) When an extent map is associated to a chunk mapping, we set the bit EXTENT_FLAG_FS_MAPPING on its flags and then make its member named 'map_lookup' point to the associated map_lookup structure. This means that for an extent map associated to an inode extent, we are not using this 'map_lookup' pointer, so wasting 8 bytes (on a 64 bits platform); 4) Extent maps associated to a chunk mapping are never merged or split so it's pointless to use the existing extent map infrastructure. So add a dedicated data structure named 'btrfs_chunk_map' to represent chunk mappings, this is basically the existing map_lookup structure with some extra fields: 1) 'start' to contain the chunk logical address; 2) 'chunk_len' to contain the chunk's length; 3) 'stripe_size' for the stripe size; 4) 'rb_node' for insertion into a rb tree; 5) 'refs' for reference counting. This way we do a single memory allocation for chunk mappings and we don't waste memory for them with unused/unnecessary fields from an extent_map. We also save 8 bytes from the extent_map structure by removing the 'map_lookup' pointer, so the size of struct extent_map is reduced from 144 bytes down to 136 bytes, and we can now have 30 extents map per 4K page instead of 28. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-12-15 20:27:02 +01:00
Qu Wenruo	3a3c7a7f65	btrfs: raid56: remove unused BTRFS_RBIO_REBUILD_MISSING Commit `aca43fe839` ("btrfs: remove unused raid56 functions which were dedicated for scrub") removed the special handling of RAID56 scrub for missing device. As scrub goes full mirror_num based recovery, that means if it hits a missing device in RAID56, it would just try the next mirror, which would go through the BTRFS_RBIO_READ_REBUILD operation. This means there is no longer any use of BTRFS_RBIO_REBUILD_MISSING operation and we can safely remove it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-21 14:52:12 +02:00
Qu Wenruo	94ead93e63	btrfs: scrub: use recovered data stripes as cache to avoid unnecessary read For P/Q stripe scrub, we have quite some duplicated read IO: - Data stripes read for verification This is triggered by the scrub_submit_initial_read() inside scrub_raid56_parity_stripe(). - Data stripes read (again) for P/Q stripe verification This is triggered by scrub_assemble_read_bios() from scrub_rbio(). Although we can have hit rbio cache and avoid unnecessary read, the chance is very low, as scrub would easily flush the whole rbio cache. This means, even we're just scrubbing a single P/Q stripe, we would read the data stripes twice for the best case scenario. If we need to recover some data stripes, it would cause more reads on the same data stripes, again and again. However before we call raid56_parity_submit_scrub_rbio() we already have all data stripes repaired and their contents ready to use. But RAID56 cache is unaware about the scrub cache, thus RAID56 layer itself still needs to re-read the data stripes. To avoid such cache miss, this patch would: - Introduce a new helper, raid56_parity_cache_data_pages() This function would grab the pages from an array, and copy the content to the rbio, marking all the involved sectors uptodate. The page copy is unavoidable because of the cache pages of rbio are all self managed, thus can not utilize outside pages without screwing up the lifespan. - Use the repaired data stripes as cache inside scrub_raid56_parity_stripe() By this, we ensure all the data sectors of the scrub rbio are already uptodate, and no need to read them again from disk. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-06-19 13:59:24 +02:00
Qu Wenruo	aca43fe839	btrfs: remove unused raid56 functions which were dedicated for scrub Since the scrub rework, the following RAID56 functions are no longer called: - raid56_add_scrub_pages() - raid56_alloc_missing_rbio() - raid56_submit_missing_rbio() Those functions are all utilized by scrub to handle missing device cases for RAID56. However the new scrub code handle them in a completely different way: - If it's data stripe, go recovery path through btrfs_submit_bio() - If it's P/Q stripe, it would be handled through raid56_parity_submit_scrub_rbio() And that function would handle dev-replace and repair properly. Thus we can safely remove those functions. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 19:52:18 +02:00
Qu Wenruo	4886ff7b50	btrfs: introduce a new helper to submit write bio for repair Both scrub and read-repair are utilizing a special repair writes that: - Only writes back to a single device Even for read-repair on RAID56, we only update the corrupted data stripe itself, not triggering the full RMW path. - Requires a valid @mirror_num For RAID56 case, only @mirror_num == 1 is valid. For non-RAID56 cases, we need @mirror_num to locate our stripe. - No data csum generation needed These two call sites still have some differences though: - Read-repair goes plain bio It doesn't need a full btrfs_bio, and goes submit_bio_wait(). - New scrub repair would go btrfs_bio To simplify both read and write path. So here this patch would: - Introduce a common helper, btrfs_map_repair_block() Due to the single device nature, we can use an on-stack btrfs_io_stripe to pass device and its physical bytenr. - Introduce a new interface, btrfs_submit_repair_bio(), for later scrub code This is for the incoming scrub code. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:23 +02:00
Colin Ian King	67da05b3f2	btrfs: fix spelling mistakes found using codespell There quite a few spelling mistakes as found using codespell. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-02-15 19:38:50 +01:00
Qu Wenruo	c5a415627b	btrfs: raid56: prepare data checksums for later RMW verification This is for later data checksum verification at RMW time. This patch will try to allocate the needed memory for a locked rbio if the rbio is for data exclusively (we don't want to handle mixed bg yet). The memory will be released when the rbio is finished. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-12-05 18:00:57 +01:00
Qu Wenruo	ad3daf1c3f	btrfs: raid56: remove the old error tracking system Since all the recovery paths have been migrated to the new error bitmap based system, we can remove the old stripe number based system. This cleanup involves one behavior change: - Rebuild rbio can no longer be merged Previously a rebuild rbio (caused by retry after data csum mismatch) can be merged, if the error happens in the same stripe. But with the new error bitmap based solution, it's much harder to compare error bitmaps. So here we just don't merge rebuild rbio at all. This may introduce some performance impact at extreme corner cases, but we're willing to take it. Other than that, this patch will cleanup the following members: - rbio::faila - rbio::failb They will be replaced by per-vertical stripe check, which is more accurate. - rbio::error It will be replace by per-vertical stripe error bitmap check. - Allow get_rbio_vertical_errors() to accept NULL pointers for @faila and @failb Some call sites only want to check if we have errors beyond the tolerance. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-12-05 18:00:55 +01:00
Qu Wenruo	2942a50dea	btrfs: raid56: introduce btrfs_raid_bio::error_bitmap Currently btrfs raid56 uses btrfs_raid_bio::faila and failb to indicate which stripe(s) had IO errors. But that has some problems: - If one sector failed csum check, the whole stripe where the corruption is will be marked error. This can reduce the chance we do recover, like this: 0 4K 8K Data 1 \|XX\| \| Data 2 \| \|XX\| Parity \| \| \| In above case, 0~4K in data 1 should be recovered using data 2 and parity, while 4K~8K in data 2 should be recovered using data 1 and parity. Currently if we trigger read on 0~4K of data 1, we will also recover 4K~8K of data 1 using corrupted data 2 and parity, causing wrong result in rbio cache. - Harder to expand for future M-N scheme As we're limited to just faila/b, two corruptions. - Harder to expand to handle extra csum errors This can be problematic if we start to do csum verification. This patch will introduce an extra @error_bitmap, where one bit represents error that happened for that sector. The choice to introduce a new error bitmap other than reusing sector_ptr, is to avoid extra search between rbio::stripe_sectors[] and rbio::bio_sectors[]. Since we can submit bio using sectors from both sectors, doing proper search on both array will more complex. Although the new bitmap will take extra memory, later we can remove things like @error and faila/b to save some memory. Currently the new error bitmap and failab mechanism coexists, the error bitmap is only updated at endio time and recover entrance. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-12-05 18:00:55 +01:00
Qu Wenruo	1a1a285139	btrfs: remove the unused endio_raid56_workers and btrfs_raid_bio::end_io_work Since we have switched all raid56 workload to submit-and-wait method, there is no use for btrfs_fs_info::endio_raid56_workers workqueue and btrfs_raid_bio::end_io_work. Remove them to save some memory. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-12-05 18:00:49 +01:00
Qu Wenruo	93723095b5	btrfs: raid56: switch write path to rmw_rbio() This includes the following changes: - Implement new raid_unplug() functions Now we don't need a workqueue to run the plug, as all our work is just queue rmw_rbio_work() call, which can be executed without sleep. - Implement a rmw_rbio_work_locked() helper This is for unlock_stripe(), which is already holding the full stripe lock. - Remove all the old functions This should already shows how complex the old functions are, as we ended up removing the following functions: * rmw_work() * validate_rbio_for_rmw() * raid56_rmw_end_io_work() * raid56_rmw_stripe() * full_stripe_write() * partial_stripe_write() * __raid56_parity_write() * run_plug() * unplug_work() * btrfs_raid_unplug() * rmw_work() * __raid56_parity_recover() * raid_recover_end_io_work() - Unexport rmw_rbio() Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-12-05 18:00:49 +01:00
Qu Wenruo	5eb30ee26f	btrfs: raid56: introduce the main entrance for RMW path The new entrance will be called rmw_rbio(), it will have a streamlined workflow by using submit-and-wait method. Thus there will be no weird jumps between tons of functions, thus way more reader friendly, and will make later expansion easier, as it's now a straight workflow, the timing is way more clear. Unfortunately we can not yet migrate the RMW path to use this new entrance as we still need extra work to address the plug and unlock_stripe() function. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-12-05 18:00:49 +01:00
Qu Wenruo	d817ce35d2	btrfs: raid56: switch recovery path to a single function Currently btrfs uses end_io functions to jump between different stages of recovery. For example, we go the following different functions: - raid56_bio_end_io() This handles the read for all the sectors (except the missing device). - __raid_recover_end_io() This does the real work, it's called inside the delayed work function raid_recover_end_io_work(). This one recovery path involves at least 3 different functions, which is a big burden for readers. This patch will change the behavior by: - Introduce a unified recovery entrance, recover_rbio() - Use submit-and-wait method So the workflow is not interrupted by the endio function jump. This doesn't bring performance change, but reduce the burden for reviewers. - Run the main function in the rmw_workers workqueue Now raid56_parity_recover() only needs to setup the work, and queue the work using start_async_work(). Now readers only need to do one function jump (start_async_work()) to find out the main entrance of recovery path. Furthermore, recover_rbio() function can easily be reused by other paths. The old recovery path is still utilized by degraded write path. It will be cleaned up when we have migrated the write path. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-12-05 18:00:49 +01:00
Christoph Hellwig	f1c2937976	btrfs: properly abstract the parity raid bio handling The parity raid write/recover functionality is currently not very well abstracted from the bio submission and completion handling in volumes.c: - the raid56 code directly completes the original btrfs_bio fed into btrfs_submit_bio instead of dispatching back to volumes.c - the raid56 code consumes the bioc and bio_counter references taken by volumes.c, which also leads to special casing of the calls from the scrub code into the raid56 code To fix this up supply a bi_end_io handler that calls back into the volumes.c machinery, which then puts the bioc, decrements the bio_counter and completes the original bio, and updates the scrub code to also take ownership of the bioc and bio_counter in all cases. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2022-09-26 12:27:59 +02:00
Christoph Hellwig	6065fd95da	btrfs: do not return errors from raid56_parity_recover Always consume the bio and call the end_io handler on error instead of returning an error and letting the caller handle it. This matches what the block layer submission does and avoids any confusion on who needs to handle errors. Also use the proper bool type for the generic_io argument. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-25 17:45:39 +02:00
Christoph Hellwig	31683f4aae	btrfs: do not return errors from raid56_parity_write Always consume the bio and call the end_io handler on error instead of returning an error and letting the caller handle it. This matches what the block layer submission does and avoids any confusion on who needs to handle errors. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-25 17:45:39 +02:00
Christoph Hellwig	ff18a4afeb	btrfs: raid56: use fixed stripe length everywhere The raid56 code assumes a fixed stripe length BTRFS_STRIPE_LEN but there are functions passing it as arguments, this is not necessary. The fixed value has been used for a long time and though the stripe length should be configurable by super block member stripesize, this hasn't been implemented and would require more changes so we don't need to keep this code around until then. Partially based on a patch from Qu Wenruo. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [ update changelog ] Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-25 17:45:39 +02:00
Qu Wenruo	0b30f71945	btrfs: use btrfs_raid_array to calculate number of parity stripes Use the raid table instead of hard coded values and rename the helper as it is exported. This could make later extension on RAID56 based profiles easier. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-25 17:45:36 +02:00
Christoph Hellwig	d34e123de1	btrfs: defer I/O completion based on the btrfs_raid_bio Instead of attaching an extra allocation an indirect call to each low-level bio issued by the RAID code, add a work_struct to struct btrfs_raid_bio and only defer the per-rbio completion action. The per-bio action for all the I/Os are trivial and can be safely done from interrupt context. As a nice side effect this also allows sharing the boilerplate code for the per-bio completions Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-25 17:45:33 +02:00
Qu Wenruo	b8bea09a45	btrfs: add trace event for submitted RAID56 bio Add tracepoint for better insight to how the RAID56 data are submitted. The output looks like this: (trace event header and UUID skipped) raid56_read_partial: full_stripe=389152768 devid=3 type=DATA1 offset=32768 opf=0x0 physical=323059712 len=32768 raid56_read_partial: full_stripe=389152768 devid=1 type=DATA2 offset=0 opf=0x0 physical=67174400 len=65536 raid56_write_stripe: full_stripe=389152768 devid=3 type=DATA1 offset=0 opf=0x1 physical=323026944 len=32768 raid56_write_stripe: full_stripe=389152768 devid=2 type=PQ1 offset=0 opf=0x1 physical=323026944 len=32768 The above debug output is from a 32K data write into an empty RAID56 data chunk. Some explanation on the event output: full_stripe: the logical bytenr of the full stripe devid: btrfs devid type: raid stripe type. DATA1: the first data stripe DATA2: the second data stripe PQ1: the P stripe PQ2: the Q stripe offset: the offset inside the stripe. opf: the bio op type physical: the physical offset the bio is for len: the length of the bio The first two lines are from partial RMW read, which is reading the remaining data stripes from disks. The last two lines are for full stripe RMW write, which is writing the involved two 16K stripes (one for DATA1 stripe, one for P stripe). The stripe for DATA2 doesn't need to be written. There are 5 types of trace events: - raid56_read_partial Read remaining data for regular read/write path. - raid56_write_stripe Write the modified stripes for regular read/write path. - raid56_scrub_read_recover Read remaining data for scrub recovery path. - raid56_scrub_write_stripe Write the modified stripes for scrub path. - raid56_scrub_read Read remaining data for scrub path. Also, since the trace events are included at super.c, we have to export needed structure definitions to 'raid56.h' and include the header in super.c, or we're unable to access those members. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ reformat comments ] Signed-off-by: David Sterba <dsterba@suse.com>	2022-07-25 17:44:34 +02:00
Qu Wenruo	6346f6bf16	btrfs: raid56: make raid56_add_scrub_pages() subpage compatible This requires one extra parameter @pgoff for the function. In the current code base, scrub is still one page per sector, thus the new parameter will always be 0. It needs the extra subpage scrub optimization code to fully take advantage. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-05-16 17:03:15 +02:00
Qu Wenruo	cc353a8be2	btrfs: reduce width for stripe_len from u64 to u32 Currently btrfs uses fixed stripe length (64K), thus u32 is wide enough for the usage. Furthermore, even in the future we choose to enlarge stripe length to larger values, I don't believe we would want stripe as large as 4G or larger. So this patch will reduce the width for all in-memory structures and parameters, this involves: - RAID56 related function argument lists This allows us to do direct division related to stripe_len. Although we will use bits shift to replace the division anyway. - btrfs_io_geometry structure This involves one change to simplify the calculation of both @stripe_nr and @stripe_offset, using div64_u64_rem(). And add extra sanity check to make sure @stripe_offset is always small enough for u32. This saves 8 bytes for the structure. - map_lookup structure This convert @stripe_len to u32, which saves 8 bytes. (saved 4 bytes, and removed a 4-bytes hole) Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-05-16 17:03:14 +02:00
Qu Wenruo	6a258d725d	btrfs: remove btrfs_raid_bio::fs_info member We can grab fs_info reliably from btrfs_raid_bio::bioc, as the bioc is always passed into alloc_rbio(), and only get released when the raid bio is released. Remove btrfs_raid_bio::fs_info member, and cleanup all the @fs_info parameters for alloc_rbio() callers. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-26 19:08:03 +02:00
Qu Wenruo	4c66461179	btrfs: rename btrfs_bio to btrfs_io_context The structure btrfs_bio is used by two different sites: - bio->bi_private for mirror based profiles For those profiles (SINGLE/DUP/RAID1*/RAID10), this structures records how many mirrors are still pending, and save the original endio function of the bio. - RAID56 code In that case, RAID56 only utilize the stripes info, and no long uses that to trace the pending mirrors. So btrfs_bio is not always bind to a bio, and contains more info for IO context, thus renaming it will make the naming less confusing. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-10-26 19:08:02 +02:00
David Sterba	72ad813157	btrfs: constify map parameter for nr_parity_stripes and nr_data_stripes Signed-off-by: David Sterba <dsterba@suse.com>	2019-07-01 13:34:58 +02:00
David Sterba	9888c3402c	btrfs: replace GPL boilerplate by SPDX -- headers Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Unify the include protection macros to match the file names. Signed-off-by: David Sterba <dsterba@suse.com>	2018-04-12 16:29:46 +02:00
Jeff Mahoney	2ff7e61e0d	btrfs: take an fs_info directly when the root is not used otherwise There are loads of functions in btrfs that accept a root parameter but only use it to obtain an fs_info pointer. Let's convert those to just accept an fs_info pointer directly. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:59 +01:00
Omar Sandoval	b4ee178268	Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation The current RAID 5/6 recovery code isn't quite prepared to handle missing devices. In particular, it expects a bio that we previously attempted to use in the read path, meaning that it has valid pages allocated. However, missing devices have a NULL blkdev, and we can't call bio_add_page() on a bio with a NULL blkdev. We could do manual manipulation of bio->bi_io_vec, but that's pretty gross. So instead, add a separate path that allows us to manually add pages to the rbio. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-08-09 07:34:26 -07:00
Zhao Lei	8e5cfb55d3	Btrfs: Make raid_map array be inlined in btrfs_bio structure It can make code more simple and clear, we need not care about free bbio and raid_map together. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-01-21 18:06:47 -08:00
Miao Xie	4245215d6a	Btrfs, raid56: fix use-after-free problem in the final device replace procedure on raid56 The commit `c404e0dc` (Btrfs: fix use-after-free in the finishing procedure of the device replace) fixed a use-after-free problem which happened when removing the source device at the end of device replace, but at that time, btrfs didn't support device replace on raid56, so we didn't fix the problem on the raid56 profile. Currently, we implemented device replace for raid56, so we need kick that problem out before we enable that function for raid56. The fix method is very simple, we just increase the bio per-cpu counter before we submit a raid56 io, and decrease the counter when the raid56 io ends. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>	2014-12-03 10:18:47 +08:00
Miao Xie	5a6ac9eacb	Btrfs, raid56: support parity scrub on raid56 The implementation is: - Read and check all the data with checksum in the same stripe. All the data which has checksum is COW data, and we are sure that it is not changed though we don't lock the stripe. because the space of that data just can be reclaimed after the current transction is committed, and then the fs can use it to store the other data, but when doing scrub, we hold the current transaction, that is that data can not be recovered, it is safe that read and check it out of the stripe lock. - Lock the stripe - Read out all the data without checksum and parity The data without checksum and the parity may be changed if we don't lock the stripe, so we need read it in the stripe lock context. - Check the parity - Re-calculate the new parity and write back it if the old parity is not right - Unlock the stripe If we can not read out the data or the data we read is corrupted, we will try to repair it. If the repair fails. we will mark the horizontal sub-stripe(pages on the same horizontal) as corrupted sub-stripe, and we will skip the parity check and repair of that horizontal sub-stripe. And in order to skip the horizontal sub-stripe that has no data, we introduce a bitmap. If there is some data on the horizontal sub-stripe, we will the relative bit to 1, and when we check and repair the parity, we will skip those horizontal sub-stripes that the relative bits is 0. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>	2014-12-03 10:18:45 +08:00
Miao Xie	af8e2d1df9	Btrfs, scrub: repair the common data on RAID5/6 if it is corrupted This patch implement the RAID5/6 common data repair function, the implementation is similar to the scrub on the other RAID such as RAID1, the differentia is that we don't read the data from the mirror, we use the data repair function of RAID5/6. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>	2014-12-03 10:18:45 +08:00
David Woodhouse	53b381b3ab	Btrfs: RAID5 and RAID6 This builds on David Woodhouse's original Btrfs raid5/6 implementation. The code has changed quite a bit, blame Chris Mason for any bugs. Read/modify/write is done after the higher levels of the filesystem have prepared a given bio. This means the higher layers are not responsible for building full stripes, and they don't need to query for the topology of the extents that may get allocated during delayed allocation runs. It also means different files can easily share the same stripe. But, it does expose us to incorrect parity if we crash or lose power while doing a read/modify/write cycle. This will be addressed in a later commit. Scrub is unable to repair crc errors on raid5/6 chunks. Discard does not work on raid5/6 (yet) The stripe size is fixed at 64KiB per disk. This will be tunable in a later commit. Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-02-01 14:24:23 -05:00

33 Commits