linux/drivers/md
Anssi Hannula 40aa978ecc dm cache: fix race causing dirty blocks to be marked as clean
When a writeback or a promotion of a block is completed, the cell of
that block is removed from the prison, the block is marked as clean, and
the clear_dirty() callback of the cache policy is called.

Unfortunately, performing those actions in this order allows an incoming
new write bio for that block to come in before clearing the dirty status
is completed and therefore possibly causing one of these two scenarios:

Scenario A:

Thread 1                      Thread 2
cell_defer()                  .
- cell removed from prison    .
- detained bios queued        .
.                             incoming write bio
.                             remapped to cache
.                             set_dirty() called,
.                               but block already dirty
.                               => it does nothing
clear_dirty()                 .
- block marked clean          .
- policy clear_dirty() called .

Result: Block is marked clean even though it is actually dirty. No
writeback will occur.

Scenario B:

Thread 1                      Thread 2
cell_defer()                  .
- cell removed from prison    .
- detained bios queued        .
clear_dirty()                 .
- block marked clean          .
.                             incoming write bio
.                             remapped to cache
.                             set_dirty() called
.                             - block marked dirty
.                             - policy set_dirty() called
- policy clear_dirty() called .

Result: Block is properly marked as dirty, but policy thinks it is clean
and therefore never asks us to writeback it.
This case is visible in "dmsetup status" dirty block count (which
normally decreases to 0 on a quiet device).

Fix these issues by calling clear_dirty() before calling cell_defer().
Incoming bios for that block will then be detained in the cell and
released only after clear_dirty() has completed, so the race will not
occur.

Found by inspecting the code after noticing spurious dirty counts
(scenario B).

Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2014-09-10 11:20:47 -04:00
..
bcache bcache: Drop unneeded blk_sync_queue() calls 2014-08-04 15:23:04 -07:00
persistent-data dm transaction manager: fix corruption due to non-atomic transaction commit 2014-03-27 16:56:23 -04:00
bitmap.c md/bitmap: remove confusing code from filemap_get_page. 2014-05-29 16:59:47 +10:00
bitmap.h kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly 2013-12-11 15:28:36 -08:00
dm-bio-prison.c dm bio prison: implement per bucket locking in the dm_bio_prison hash table 2014-06-11 16:48:54 -04:00
dm-bio-prison.h dm thin: return ENOSPC instead of EIO when error_if_no_space enabled 2014-06-03 13:44:08 -04:00
dm-bio-record.h dm: Refactor for new bio cloning/splitting 2013-11-23 22:33:55 -08:00
dm-bufio.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-04 16:23:30 -07:00
dm-bufio.h dm snapshot: use dm-bufio prefetch 2014-01-14 23:23:03 -05:00
dm-builtin.c dm sysfs: fix a module unload race 2014-01-14 23:23:04 -05:00
dm-cache-block-types.h dm cache: remove remainder of distinct discard block size 2014-03-27 16:56:23 -04:00
dm-cache-metadata.c dm cache metadata: use dm-space-map-metadata.h defined size limits 2014-08-01 12:30:33 -04:00
dm-cache-metadata.h dm cache metadata: use dm-space-map-metadata.h defined size limits 2014-08-01 12:30:33 -04:00
dm-cache-policy-cleaner.c dm cache: policy change version from string to integer set 2013-03-20 17:21:27 +00:00
dm-cache-policy-internal.h dm cache: add remove_cblock method to policy interface 2013-11-11 11:37:50 -05:00
dm-cache-policy-mq.c dm cache mq: fix memory allocation failure for large cache devices 2014-02-28 12:18:29 -05:00
dm-cache-policy.c dm cache: add policy name to status output 2014-01-16 13:44:11 -05:00
dm-cache-policy.h dm cache: add policy name to status output 2014-01-16 13:44:11 -05:00
dm-cache-target.c dm cache: fix race causing dirty blocks to be marked as clean 2014-09-10 11:20:47 -04:00
dm-crypt.c dm crypt: fix access beyond the end of allocated space 2014-08-28 14:24:09 -04:00
dm-delay.c Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block 2014-01-30 11:19:05 -08:00
dm-era-target.c dm era: check for a non-NULL metadata object before closing it 2014-06-03 13:44:08 -04:00
dm-exception-store.c dm: replace simple_strtoul 2012-07-27 15:07:59 +01:00
dm-exception-store.h
dm-flakey.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-io.c dm io: simplify dec_count and sync_io 2014-08-01 12:30:30 -04:00
dm-ioctl.c dm: allow remove to be deferred 2013-11-09 18:20:22 -05:00
dm-kcopyd.c dm: stop using WQ_NON_REENTRANT 2013-08-23 09:02:13 -04:00
dm-linear.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-log-userspace-base.c dm log userspace: allow mark requests to piggyback on flush requests 2014-01-21 23:46:27 -05:00
dm-log-userspace-transfer.c connector: add portid to unicast in addition to broadcasting 2014-02-07 15:40:17 -08:00
dm-log-userspace-transfer.h
dm-log.c dm: use memweight() 2012-07-30 17:25:16 -07:00
dm-mpath.c dm mpath: eliminate pg_ready() wrapper 2014-08-01 12:30:31 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-raid1.c dm raid1: fix immutable biovec related BUG when retrying read bio 2014-02-18 10:48:57 -05:00
dm-raid.c MD: Remember the last sync operation that was performed 2013-06-26 12:38:24 +10:00
dm-region-hash.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-round-robin.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-service-time.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-snap-persistent.c dm snapshot: fix metadata corruption 2014-03-03 17:58:13 -05:00
dm-snap-transient.c
dm-snap.c sched: Remove proliferation of wait_on_bit() action functions 2014-07-16 15:10:39 +02:00
dm-stats.c dm stats: initialize read-only module parameter 2013-12-10 19:13:21 -05:00
dm-stats.h dm: add statistics support 2013-09-05 20:46:06 -04:00
dm-stripe.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-switch.c dm switch: efficiently support repetitive patterns 2014-08-01 12:30:37 -04:00
dm-sysfs.c dm sysfs: fix a module unload race 2014-01-14 23:23:04 -05:00
dm-table.c dm table: propagate QUEUE_FLAG_NO_SG_MERGE 2014-08-10 20:54:49 -04:00
dm-target.c dm: allow error target to replace bio-based and request-based targets 2013-09-05 20:46:05 -04:00
dm-thin-metadata.c dm thin metadata: do not allow the data block size to change 2014-07-15 14:05:26 -04:00
dm-thin-metadata.h dm thin: ensure user takes action to validate data and metadata consistency 2014-03-05 15:25:35 -05:00
dm-thin.c dm thin: set minimum_io_size to pool's data block size 2014-08-01 12:30:35 -04:00
dm-uevent.c
dm-uevent.h
dm-verity.c dm verity: fix biovecs hash calculation regression 2014-04-15 12:19:24 -04:00
dm-zero.c dm crypt, dm zero: update author name following legal name change 2014-07-10 16:44:14 -04:00
dm.c dm: allocate a special workqueue for deferred device removal 2014-07-10 16:44:13 -04:00
dm.h dm table: make dm_table_supports_discards static 2014-08-01 12:30:34 -04:00
faulty.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
Kconfig dm: add era target 2014-03-27 16:56:23 -04:00
linear.c block: Introduce new bio_split() 2013-11-23 22:33:57 -08:00
linear.h
Makefile dm: add era target 2014-03-27 16:56:23 -04:00
md.c md: don't allow bitmap file to be added to raid0/linear. 2014-08-08 15:43:20 +10:00
md.h md/bitmap: don't abuse i_writecount for bitmap files. 2014-04-09 12:26:59 +10:00
multipath.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
multipath.h
raid0.c md/raid0: check for bitmap compatability when changing raid levels. 2014-08-08 15:33:17 +10:00
raid0.h
raid1.c md/raid1,raid10: always abort recover on write error. 2014-07-31 10:16:52 +10:00
raid1.h raid1: Rewrite the implementation of iobarrier. 2013-11-19 15:19:18 +11:00
raid5.c md/raid6: avoid data corruption during recovery of double-degraded RAID6 2014-08-18 14:49:46 +10:00
raid5.h raid5: add an option to avoid copy data from bio to stripe cache 2014-05-29 16:59:47 +10:00
raid10.c md/raid10: always initialise ->state on newly allocated r10_bio 2014-08-19 17:20:27 +10:00
raid10.h MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1) 2013-02-26 11:55:30 +11:00