linux/drivers/md
Nikos Tsironis 694cfe7f31 dm thin: Flush data device before committing metadata
The thin provisioning target maintains per thin device mappings that map
virtual blocks to data blocks in the data device.

When we write to a shared block, in case of internal snapshots, or
provision a new block, in case of external snapshots, we copy the shared
block to a new data block (COW), update the mapping for the relevant
virtual block and then issue the write to the new data block.

Suppose the data device has a volatile write-back cache and the
following sequence of events occur:

1. We write to a shared block
2. A new data block is allocated
3. We copy the shared block to the new data block using kcopyd (COW)
4. We insert the new mapping for the virtual block in the btree for that
   thin device.
5. The commit timeout expires and we commit the metadata, that now
   includes the new mapping from step (4).
6. The system crashes and the data device's cache has not been flushed,
   meaning that the COWed data are lost.

The next time we read that virtual block of the thin device we read it
from the data block allocated in step (2), since the metadata have been
successfully committed. The data are lost due to the crash, so we read
garbage instead of the old, shared data.

This has the following implications:

1. In case of writes to shared blocks, with size smaller than the pool's
   block size (which means we first copy the whole block and then issue
   the smaller write), we corrupt data that the user never touched.

2. In case of writes to shared blocks, with size equal to the device's
   logical block size, we fail to provide atomic sector writes. When the
   system recovers the user will read garbage from that sector instead
   of the old data or the new data.

3. Even for writes to shared blocks, with size equal to the pool's block
   size (overwrites), after the system recovers, the written sectors
   will contain garbage instead of a random mix of sectors containing
   either old data or new data, thus we fail again to provide atomic
   sectors writes.

4. Even when the user flushes the thin device, because we first commit
   the metadata and then pass down the flush, the same risk for
   corruption exists (if the system crashes after the metadata have been
   committed but before the flush is passed down to the data device.)

The only case which is unaffected is that of writes with size equal to
the pool's block size and with the FUA flag set. But, because FUA writes
trigger metadata commits, this case can trigger the corruption
indirectly.

Moreover, apart from internal and external snapshots, the same issue
exists for newly provisioned blocks, when block zeroing is enabled.
After the system recovers the provisioned blocks might contain garbage
instead of zeroes.

To solve this and avoid the potential data corruption we flush the
pool's data device **before** committing its metadata.

This ensures that the data blocks of any newly inserted mappings are
properly written to non-volatile storage and won't be lost in case of a
crash.

Cc: stable@vger.kernel.org
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-12-06 11:46:16 -05:00
..
bcache for-5.4/block-2019-09-16 2019-09-17 16:57:47 -07:00
persistent-data dm btree: increase rebalance threshold in __rebalance2() 2019-12-05 15:27:52 -05:00
dm-bio-prison-v1.c dm bio prison: replace spin_lock_irqsave with spin_lock_irq 2019-11-05 14:53:03 -05:00
dm-bio-prison-v1.h
dm-bio-prison-v2.c dm bio prison: replace spin_lock_irqsave with spin_lock_irq 2019-11-05 14:53:03 -05:00
dm-bio-prison-v2.h
dm-bio-record.h
dm-bufio.c dm bufio: introduce a global cache replacement 2019-09-13 17:00:21 -04:00
dm-builtin.c
dm-cache-background-tracker.c
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c dm cache metadata: Fix loading discard bitset 2019-04-18 16:18:25 -04:00
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c dm: remove unnecessary unlikely() around WARN_ON_ONCE() 2018-10-16 14:34:59 -04:00
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c dm cache: replace spin_lock_irqsave with spin_lock_irq 2019-11-05 14:53:04 -05:00
dm-clone-metadata.c dm clone metadata: Use a two phase commit 2019-12-05 15:27:54 -05:00
dm-clone-metadata.h dm clone metadata: Use a two phase commit 2019-12-05 15:27:54 -05:00
dm-clone-target.c dm clone: Flush destination device before committing metadata 2019-12-05 17:05:23 -05:00
dm-core.h dm: disable DISCARD if the underlying storage no longer supports it 2019-04-04 15:33:59 -04:00
dm-crypt.c Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues" 2019-11-20 17:27:39 -05:00
dm-delay.c dm delay: fix a crash when invalid device is specified 2019-04-26 11:29:32 -04:00
dm-dust.c dm dust: add limited write failure mode 2019-11-05 15:25:34 -05:00
dm-era-target.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
dm-exception-store.c
dm-exception-store.h - Improve DM snapshot target's scalability by using finer grained 2019-05-16 15:55:48 -07:00
dm-flakey.c block: Kill gfp_t argument of blkdev_report_zones() 2019-07-11 20:04:37 -06:00
dm-init.c docs: device-mapper: move it to the admin-guide 2019-07-15 11:03:01 -03:00
dm-integrity.c dm integrity: fix excessive alignment of metadata runs 2019-11-15 14:49:16 -05:00
dm-io.c dm: Use kzalloc for all structs with embedded biosets/mempools 2018-06-05 08:47:43 -06:00
dm-ioctl.c dm: introduce DM_GET_TARGET_VERSION 2019-09-16 10:18:01 -04:00
dm-kcopyd.c dm kcopyd: always complete failed jobs 2019-08-15 15:57:39 -04:00
dm-linear.c block: Kill gfp_t argument of blkdev_report_zones() 2019-07-11 20:04:37 -06:00
dm-log-userspace-base.c dm: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c dm log writes: fix incorrect comment about the logged sequence example 2019-07-09 14:13:33 -04:00
dm-log.c
dm-mpath.c dm mpath: remove harmful bio-based optimization 2019-11-26 10:22:46 -05:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid1.c dm raid1: use struct_size() with kzalloc() 2019-08-26 11:05:32 -04:00
dm-raid.c dm raid: Remove unnecessary negation of a shift in raid10_format_to_md_layout 2019-11-07 11:59:38 -05:00
dm-region-hash.c - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
dm-round-robin.c
dm-rq.c block: Delay default elevator initialization 2019-09-05 19:52:34 -06:00
dm-rq.h dm: remove unused _rq_tio_cache and _rq_cache 2019-03-05 14:48:50 -05:00
dm-service-time.c
dm-snap-persistent.c
dm-snap-transient.c
dm-snap.c dm snapshot: rework COW throttling to fix deadlock 2019-10-10 09:46:05 -04:00
dm-stats.c dm stats: use struct_size() helper 2019-09-04 09:39:22 -04:00
dm-stats.h
dm-stripe.c dm stripe: use struct_size() in kmalloc() 2019-11-05 14:09:59 -05:00
dm-switch.c dm switch: use struct_size() in kzalloc() 2019-03-05 14:48:51 -05:00
dm-sysfs.c dm: remove legacy request-based IO path 2018-10-11 11:36:09 -04:00
dm-table.c dm table: do not allow request-based DM to stack on partitions 2019-11-05 11:22:52 -05:00
dm-target.c dm mpath: fix missing call of path selector type->end_io 2019-04-25 15:38:52 -04:00
dm-thin-metadata.c dm thin metadata: Add support for a pre-commit callback 2019-12-05 17:05:24 -05:00
dm-thin-metadata.h dm thin metadata: Add support for a pre-commit callback 2019-12-05 17:05:24 -05:00
dm-thin.c dm thin: Flush data device before committing metadata 2019-12-06 11:46:16 -05:00
dm-uevent.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
dm-uevent.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
dm-unstripe.c dm: Check for device sector overflow if CONFIG_LBDAF is not set 2018-12-18 09:02:26 -05:00
dm-verity-fec.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
dm-verity-fec.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
dm-verity-target.c dm verity: add root hash pkcs#7 signature verification 2019-08-23 10:13:14 -04:00
dm-verity-verify-sig.c dm verity: add root hash pkcs#7 signature verification 2019-08-23 10:13:14 -04:00
dm-verity-verify-sig.h dm verity: add root hash pkcs#7 signature verification 2019-08-23 10:13:14 -04:00
dm-verity.h dm verity: add root hash pkcs#7 signature verification 2019-08-23 10:13:14 -04:00
dm-writecache.c dm writecache: handle REQ_FUA 2019-11-05 14:21:40 -05:00
dm-zero.c
dm-zoned-metadata.c dm zoned: reduce overhead of backing device checks 2019-11-07 10:08:36 -05:00
dm-zoned-reclaim.c dm zoned: reduce overhead of backing device checks 2019-11-07 10:08:36 -05:00
dm-zoned-target.c dm zoned: reduce overhead of backing device checks 2019-11-07 10:08:36 -05:00
dm-zoned.h dm zoned: reduce overhead of backing device checks 2019-11-07 10:08:36 -05:00
dm.c dm: make dm_table_find_target return NULL 2019-08-23 10:13:12 -04:00
dm.h dm: make dm_table_find_target return NULL 2019-08-23 10:13:12 -04:00
Kconfig dm: Fix Kconfig indentation 2019-11-20 10:35:31 -05:00
Makefile dm: add clone target 2019-09-12 09:32:31 -04:00
md-bitmap.c md-bitmap: create and destroy wb_info_pool with the change of bitmap 2019-06-20 16:36:00 -07:00
md-bitmap.h md: Avoid namespace collision with bitmap API 2018-08-01 15:49:39 -07:00
md-cluster.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 45 2019-05-24 17:27:12 +02:00
md-cluster.h md-cluster: introduce resync_info_get interface for sanity check 2018-10-18 09:36:35 -07:00
md-faulty.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 47 2019-05-24 17:27:13 +02:00
md-linear.c md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone 2019-09-03 14:49:28 -07:00
md-linear.h
md-multipath.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 47 2019-05-24 17:27:13 +02:00
md-multipath.h md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
md.c md: add feature flag MD_FEATURE_RAID0_LAYOUT 2019-09-13 13:10:06 -07:00
md.h md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone 2019-09-03 14:49:28 -07:00
raid0.c md/raid0: fix warning message for parameter default_layout 2019-10-16 09:43:02 -07:00
raid0.h md/raid0: avoid RAID0 data corruption due to layout confusion. 2019-09-13 13:10:05 -07:00
raid1-10.c md: raid1-10: Unify r{1,10}bio_pool_free 2019-06-15 01:37:35 -06:00
raid1.c md/raid1: fail run raid1 array when active disk less than one 2019-09-03 14:52:03 -07:00
raid1.h md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
raid5-cache.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 2019-06-05 17:36:37 +02:00
raid5-log.h raid5: set write hint for PPL 2019-03-12 10:15:18 -07:00
raid5-ppl.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 2019-06-05 17:36:37 +02:00
raid5.c raid5: remove STRIPE_OPS_REQ_PENDING 2019-09-13 13:14:39 -07:00
raid5.h raid5: use bio_end_sector in r5_next_bio 2019-09-13 13:14:43 -07:00
raid10.c md: allow last device to be forcibly removed from RAID1/RAID10. 2019-08-07 10:25:02 -07:00
raid10.h md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00