Commit Graph

863 Commits

Author SHA1 Message Date
Linus Torvalds
6aaf0da872 md updates for 4.2
A mixed bag
  - a few bug fixes
  - some performance improvement that decrease lock contention
  - some clean-up
 
 Nothing major.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVi6weAAoJEDnsnt1WYoG50CsP/RqFbZicRSIvzXUURwP+yCP0
 3YZuURj4IXC6Cy/HLX+bZoj1p/b+GIRsZ72fWFJrd2LheaAI6WojCCLlnmXUtI/Y
 LIppF8/A2hfCNbF9cILByvrbzfndeEGK8kvootBDpvD0jlYiGePPAMQY2zx0MAyb
 T4yJ/KiziLniP6x7vqZrQ6I1MRVjeanN6RWXktFtixMpNOKUJe3PiZbUz4VDIrHR
 DaiHCbMjvRIkUWgNY8HmijEt+c8AYia7muqLj359dy2xF1hlUIdCx+61cgFD1zd8
 enKDH3xp+3B9BEgHe+AtxTAzpqSgU93tdhUjGcy/orA+yYjAAcA4ifngrzfE3VKb
 kwQgPh2JvUrubavrcto0hthS5RldrCpDXebOM4aEq+7lDHCwrZ39Qio5+1F7TLt5
 A5E3Eb7dPRdp9T3LrluX8/f7bO/Wbmxvv/RwnSLTpnGQoBWIAqCpQ+e9ro446Gsx
 /phXv3tE78fKj88LgQY/mm8ICeCppmQGLrpmjk9bkaZzqFdzQoURVmPh8QPMuJB4
 iMHpOOKLzrUlW/23rRxaIKwPuFyxlNuLAvyA3ezsymGiZ+SqSeFCEm1jN64EfMCI
 39rpfZt2pcVVOZJ9YeuzZG9wpie96yGZgnVWlP3FPjqRpboXqmtHlYA6EMRtqDAy
 mjSiGDF2bxkT1/YcjELD
 =sXTI
 -----END PGP SIGNATURE-----

Merge tag 'md/4.2' of git://neil.brown.name/md

Pull md updates from Neil Brown:
 "A mixed bag

   - a few bug fixes
   - some performance improvement that decrease lock contention
   - some clean-up

  Nothing major"

* tag 'md/4.2' of git://neil.brown.name/md:
  md: clear Blocked flag on failed devices when array is read-only.
  md: unlock mddev_lock on an error path.
  md: clear mddev->private when it has been freed.
  md: fix a build warning
  md/raid5: ignore released_stripes check
  md/raid5: per hash value and exclusive wait_for_stripe
  md/raid5: split wait_for_stripe and introduce wait_for_quiescent
  wait: introduce wait_event_exclusive_cmd
  md: convert to kstrto*()
  md/raid10: make sync_request_write() call bio_copy_data()
2015-06-29 11:10:56 -07:00
Rasmus Villemoes
90a9befb20 drivers/md/md.c: use strreplace()
There's no point in starting over when we meet a '/'.  This also
eliminates a stack variable and a little .text.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:40 -07:00
Neil Brown
ab16bfc732 md: clear Blocked flag on failed devices when array is read-only.
The Blocked flag indicates that a device has failed but that this
fact hasn't been recorded in the metadata yet.  Writes to such
devices cannot be allowed until the metadata has been updated.

On a read-only array, the Blocked flag will never be cleared.
This prevents the device being removed from the array.

If the metadata is being handled by the kernel
(i.e. !mddev->external), then we can be sure that if the array is
switch to writable, then a metadata update will happen and will
record the failure.  So we don't need the flag set.

If metadata is externally managed, it is upto the external manager
to clear the 'blocked' flag.

Reported-by: XiaoNi <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-25 17:16:49 +10:00
NeilBrown
9a8c0fa861 md: unlock mddev_lock on an error path.
This error path retuns while still holding the lock - bad.

Fixes: 6791875e2e ("md: make reconfig_mutex optional for writes to md sysfs files.")
Cc: stable@vger.kernel.org (v4.0+)
Signed-off-by: NeilBrown <neilb@suse.com>
2015-06-25 17:14:09 +10:00
NeilBrown
bd6919228d md: clear mddev->private when it has been freed.
If ->private is set when ->run is called, it is assumed to be
a 'config'  prepared as part of 'reshape'.

So it is important when we free that config, that we also clear ->private.
This is not often a problem as the mddev will normally be discarded
shortly after the config us freed.
However if an 'assemble' races with a final close, the assemble can use
the old mddev which has a stale ->private.  This leads to any of
various sorts of crashes.

So clear ->private after calling ->free().

Reported-by: Nate Clark <nate@neworld.us>
Cc: stable@vger.kernel.org (v4.0+)
Fixes: afa0f557cb ("md: rename ->stop to ->free")
Signed-off-by: NeilBrown <neilb@suse.com>
2015-06-25 17:14:09 +10:00
Miklos Szeredi
9bf39ab2ad vfs: add file_path() helper
Turn
	d_path(&file->f_path, ...);
into
	file_path(file, ...);

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:00:05 -04:00
Firo Yang
4e02361232 md: fix a build warning
Warning like this:

drivers/md/md.c: In function "update_array_info":
drivers/md/md.c:6394:26: warning: logical not is only applied
to the left hand side of comparison [-Wlogical-not-parentheses]
      !mddev->persistent  != info->not_persistent||

Fix it as Neil Brown said:
mddev->persistent != !info->not_persistent ||

Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-17 10:00:38 +10:00
Alexey Dobriyan
4c9309c0cc md: convert to kstrto*()
Convert away from deprecated simple_strto*() functions.

Add "fit into sector_t" checks.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-17 10:00:06 +10:00
NeilBrown
ea358cd0d2 md: make sure MD_RECOVERY_DONE is clear before starting recovery/resync
MD_RECOVERY_DONE is normally cleared by md_check_recovery after a
resync etc finished.  However it is possible for raid5_start_reshape
to race and start a reshape before MD_RECOVERY_DONE is cleared.  This
can lean to multiple reshapes running at the same time, which isn't
good.

To make sure it is cleared before starting a reshape, and also clear
it when reaping a thread, just to be safe.

Signed-off-by: NeilBrown  <neilb@suse.de>
2015-06-12 20:16:33 +10:00
NeilBrown
8e8e2518fc md: Close race when setting 'action' to 'idle'.
Checking ->sync_thread without holding the mddev_lock()
isn't really safe, even after flushing the workqueue which
ensures md_start_sync() has been run.

While this code is waiting for the lock, md_check_recovery could reap
the thread itself, and then start another thread (e.g. recovery might
finish, then reshape starts).  When this thread gets the lock
md_start_sync() hasn't run so it doesn't get reaped, but
MD_RECOVERY_RUNNING gets cleared.  This allows two threads to start
which leads to confusion.

So don't both if MD_RECOVERY_RUNNING isn't set, but if it is do
the flush and the test and the reap all under the mddev_lock to
avoid any race with md_check_recovery.

Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: 6791875e2e ("md: make reconfig_mutex optional for writes to md sysfs files.")
Cc: stable@vger.kernel.org (v4.0+)
2015-06-12 20:16:26 +10:00
NeilBrown
c008f1d356 md: don't return 0 from array_state_store
Returning zero from a 'store' function is bad.
The return value should be either len length of the string
or an error.

So use 'len' if 'err' is zero.

Fixes: 6791875e2e ("md: make reconfig_mutex optional for writes to md sysfs files.")
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel (v4.0+)
2015-06-12 20:16:16 +10:00
Linus Torvalds
c492e2d464 Assorted fixes for new RAID5 stripe-batching functionality.
Unfortunately this functionality was merged a little prematurely.
 The necessary testing and code review is now complete (or as
 complete as it can be) and to code passes a variety of tests
 and looks quite sensible.
 
 Also a fix for some recent locking changes - a race was introduced
 which causes a reshape request to sometimes fail.  No data safety issues.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIVAwUAVWf/sDnsnt1WYoG5AQJ8hQ/+KIGUijacXXUXBE4QuO1DMTkltV61bk6E
 TJQ6fTuMvXeOuyGm+BoSFTJrOJiP6/PVxl4jnAkjLlvAK/JVKekG0PXv2flmD9EJ
 udK/g8d2k+4L2O0uiGdGSfOQaEaQ4OQvNmQOP9GF/FXNdyYfZbSJnxG+kzWnStGZ
 3LNEMoDok9TiUDVSJ3PgibnUHYr3zNJFjBGszfRW0HqXBRWM5TI6HQ0bWwrm61mQ
 sIOvFeS7CVOBQWW7zkY3uvz/g7dpuPlXqmDOomF+prKlU320SrpSDDBD2Qg56rXh
 8YGAzLPV8R6xB5hjGFnoHtvxF/f5Fntb3WbC5az0zv+q/phDYA9Nd2UN5APemyGB
 PJuxW4Ojq2DWIAvmf0HQEkvjJlqeugCgIQXJJ8yvIaBXJJjit1jMSEXjolM4vlLh
 h6Su/hwoyTi9NxdYpFeR6JuyHjzTyrjyBkbW8y12wVQjmDncBdKtieYZX4TvPxVz
 n7Qrk2bpFhR/icP6eYWCvt6iwU1e+5lXNb/18AYm9bJe5BE5/N1X0azrxbdZT4cl
 1DvQw2HAMBGp+nSr+R1lqO4yX+busBZUTYsaGvH4T7Ubs+UjwgTE3tPoevj6w829
 0/7r/UPfSn0XFbd5rrPY+bOBsAOIMDG5g3mj7K7+38sVeX9VOVN4sGftS5dWTr9e
 RQBTZAK0+qI=
 =Y0Vm
 -----END PGP SIGNATURE-----

Merge tag 'md/4.1-rc5-fixes' of git://neil.brown.name/md

Pull m,ore md bugfixes gfrom Neil Brown:
 "Assorted fixes for new RAID5 stripe-batching functionality.

  Unfortunately this functionality was merged a little prematurely.  The
  necessary testing and code review is now complete (or as complete as
  it can be) and to code passes a variety of tests and looks quite
  sensible.

  Also a fix for some recent locking changes - a race was introduced
  which causes a reshape request to sometimes fail.  No data safety
  issues"

* tag 'md/4.1-rc5-fixes' of git://neil.brown.name/md:
  md: fix race when unfreezing sync_action
  md/raid5: break stripe-batches when the array has failed.
  md/raid5: call break_stripe_batch_list from handle_stripe_clean_event
  md/raid5: be more selective about distributing flags across batch.
  md/raid5: add handle_flags arg to break_stripe_batch_list.
  md/raid5: duplicate some more handle_stripe_clean_event code in break_stripe_batch_list
  md/raid5: remove condition test from check_break_stripe_batch_list.
  md/raid5: Ensure a batch member is not handled prematurely.
  md/raid5: close race between STRIPE_BIT_DELAY and batching.
  md/raid5: ensure whole batch is delayed for all required bitmap updates.
2015-05-29 10:35:21 -07:00
NeilBrown
56ccc1125b md: fix race when unfreezing sync_action
A recent change removed the need for locking around writing
to "sync_action" (and various other places), but introduced a
subtle race.
When e.g. setting 'reshape' on a 'frozen' array, the 'frozen'
flag is cleared before 'reshape' is set, so the md thread can
get in and start trying recovery - which isn't wanted.

So instead of clearing MD_RECOVERY_FROZEN for any command
except 'frozen', only clear it when each specific command
is parsed.  This allows the handling of 'reshape' to clear
the bit while a lock is held.

Also remove some places where we set MD_RECOVERY_NEEDED,
as it is always set on non-error exit of the function.


Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: 6791875e2e ("md: make reconfig_mutex optional for writes to md sysfs files.")
2015-05-28 18:04:45 +10:00
Linus Torvalds
1daac193f2 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A collection of fixes since the merge window;

   - fix for a double elevator module release, from Chao Yu.  Ancient bug.

   - the splice() MORE flag fix from Christophe Leroy.

   - a fix for NVMe, fixing a patch that went in in the merge window.
     From Keith.

   - two fixes for blk-mq CPU hotplug handling, from Ming Lei.

   - bdi vs blockdev lifetime fix from Neil Brown, fixing and oops in md.

   - two blk-mq fixes from Shaohua, fixing a race on queue stop and a
     bad merge issue with FUA writes.

   - division-by-zero fix for writeback from Tejun.

   - a block bounce page accounting fix, making sure we inc/dec after
     bouncing so that pre/post IO pages match up.  From Wang YanQing"

* 'for-linus' of git://git.kernel.dk/linux-block:
  splice: sendfile() at once fails for big files
  blk-mq: don't lose requests if a stopped queue restarts
  blk-mq: fix FUA request hang
  block: destroy bdi before blockdev is unregistered.
  block:bounce: fix call inc_|dec_zone_page_state on different pages confuse value of NR_BOUNCE
  elevator: fix double release of elevator module
  writeback: use |1 instead of +1 to protect against div by zero
  blk-mq: fix CPU hotplug handling
  blk-mq: fix race between timeout and CPU hotplug
  NVMe: Fix VPD B0 max sectors translation
2015-05-08 19:49:35 -07:00
NeilBrown
6cd18e711d block: destroy bdi before blockdev is unregistered.
Because of the peculiar way that md devices are created (automatically
when the device node is opened), a new device can be created and
registered immediately after the
	blk_unregister_region(disk_devt(disk), disk->minors);
call in del_gendisk().

Therefore it is important that all visible artifacts of the previous
device are removed before this call.  In particular, the 'bdi'.

Since:
commit c4db59d31e
Author: Christoph Hellwig <hch@lst.de>
    fs: don't reassign dirty inodes to default_backing_dev_info

moved the
   device_unregister(bdi->dev);
call from bdi_unregister() to bdi_destroy() it has been quite easy to
lose a race and have a new (e.g.) "md127" be created after the
blk_unregister_region() call and before bdi_destroy() is ultimately
called by the final 'put_disk', which must come after del_gendisk().

The new device finds that the bdi name is already registered in sysfs
and complains

> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'

We can fix this by moving the bdi_destroy() call out of
blk_release_queue() (which can happen very late when a refcount
reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
device driver calls it.

Then it is only necessary for md to call blk_cleanup_queue() before
del_gendisk().  As loop.c devices are also created on demand by
opening the device node, we make the same change there.

Fixes: c4db59d31e
Reported-by: Azat Khuzhin <a3at.mail@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org (v4.0)
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-04-27 10:27:20 -06:00
NeilBrown
ac8fa4196d md: allow resync to go faster when there is competing IO.
When md notices non-sync IO happening while it is trying
to resync (or reshape or recover) it slows down to the
set minimum.

The default minimum might have made sense many years ago
but the drives have become faster.  Changing the default
to match the times isn't really a long term solution.

This patch changes the code so that instead of waiting until the speed
has dropped to the target, it just waits until pending requests
have completed.
This means that the delay inserted is a function of the speed
of the devices.

Testing shows that:
 - for some loads, the resync speed is unchanged.  For those loads
   increasing the minimum doesn't change the speed either.
   So this is a good result.  To increase resync speed under such
   loads we would probably need to increase the resync window
   size.

 - for other loads, resync speed does increase to a reasonable
   fraction (e.g. 20%) of maximum possible, and throughput of
   the load only drops a little bit (e.g. 10%)

 - for other loads, throughput of the non-sync load drops quite a bit
   more.  These seem to be latency-sensitive loads.

So it isn't a perfect solution, but it is mostly an improvement.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22 08:00:40 +10:00
NeilBrown
09314799e4 md: remove 'go_faster' option from ->sync_request()
This option is not well justified and testing suggests that
it hardly ever makes any difference.

The comment suggests there might be a need to wait for non-resync
activity indicated by ->nr_waiting, however raise_barrier()
already waits for all of that.

So just remove it to simplify reasoning about speed limiting.

This allows us to remove a 'FIXME' comment from raid5.c as that
never used the flag.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22 08:00:40 +10:00
NeilBrown
50c37b136a md: don't require sync_min to be a multiple of chunk_size.
There is really no need for sync_min to be a multiple of
chunk_size, and values read from here often aren't.
That means you cannot read a value and expect to be able
to write it back later.

So remove the chunk_size check, and round down to a multiple
of 4K, to be sure everything works with 4K-sector devices.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22 08:00:40 +10:00
NeilBrown
d51e4fe6d6 Merge branch 'cluster' into for-next 2015-04-22 08:00:20 +10:00
Goldwyn Rodrigues
97f6cd39da md-cluster: re-add capabilities
When "re-add" is writted to /sys/block/mdXX/md/dev-YYY/state,
the clustered md:

1. Sends RE_ADD message with the desc_nr. Nodes receiving the message
   clear the Faulty bit in their respective rdev->flags.
2. The node initiating re-add, gathers the bitmaps of all nodes
   and copies them into the local bitmap. It does not clear the bitmap
   from which it is copying.
3. Initiating node schedules a md recovery to sync the devices.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22 07:59:39 +10:00
Goldwyn Rodrigues
a6da4ef85c md: re-add a failed disk
This adds the capability of re-adding a failed disk by
writing "re-add" to /sys/block/mdXX/md/dev-YYY/state.

This facilitates adding disks which have encountered a temporary
error such as a network disconnection/hiccup in an iSCSI device,
or a SAN cable disconnection which has been restored. In such
a situation, you do not need to remove and re-add the device.
Writing re-add to the failed device's state would add it again
to the array and perform the recovery of only the blocks which
were written after the device failed.

This works for generic md, and is not related to clustering. However,
this patch is to ease re-add operations listed above in clustering
environments.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22 07:59:39 +10:00
Goldwyn Rodrigues
88bcfef7be md-cluster: remove capabilities
This adds "remove" capabilities for the clustered environment.
When a user initiates removal of a device from the array, a
REMOVE message with disk number in the array is sent to all
the nodes which kick the respective device in their own array.

This facilitates the removal of failed devices.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22 07:59:39 +10:00
Goldwyn Rodrigues
57d051dcca md: Export and rename find_rdev_nr_rcu
This is required by the clustering module (patches to follow) to
find the device to remove or re-add.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22 07:59:39 +10:00
Goldwyn Rodrigues
fb56dfef4e md: Export and rename kick_rdev_from_array
This export is required for clustering module in order to
co-ordinate remove/readd a rdev from all nodes.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22 07:59:39 +10:00
Gu Zheng
74672d069b md: fix md io stats accounting broken
Simon reported the md io stats accounting issue:
"
I'm seeing "iostat -x -k 1" print this after a RAID1 rebuild on 4.0-rc5.
It's not abnormal other than it's 3-disk, with one being SSD (sdc) and
the other two being write-mostly:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00   345.00    0.00    0.00    0.00   0.00 100.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00 58779.00    0.00    0.00    0.00   0.00 100.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00    12.00    0.00    0.00    0.00   0.00 100.00
"
The cause is commit "18c0b223cf9901727ef3b02da6711ac930b4e5d4" uses the
generic_start_io_acct to account the disk stats rather than the open code,
but it also introduced the increase to .in_flight[rw] which is needless to
md. So we re-use the open code here to fix it.

Reported-by: Simon Kirby <sim@hostway.ca>
Cc: <stable@vger.kernel.org> 3.19
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-08 12:53:00 +10:00
Goldwyn Rodrigues
fa8259da0e md: Fix stray --cluster-confirm crash
A --cluster-confirm without an --add (by another node) can
crash the kernel.

Fix it by guarding it using a state.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-21 10:33:00 +11:00
NeilBrown
0c35bd4723 md: fix problems with freeing private data after ->run failure.
If ->run() fails, it can either free the data structures it
allocated, or leave that task to ->free() which will be called
on failures.

However:
  md.c calls ->free() even if ->private_data is NULL, which
     causes problems in some personalities.
  raid0.c frees the data, but doesn't clear ->private_data,
     which will become a problem when we fix md.c

So better fix both these issues at once.

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Fixes: 5aa61f427e
URL: https://bugzilla.kernel.org/show_bug.cgi?id=94381
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-21 09:40:36 +11:00
NeilBrown
ba599aca52 md: fix error paths from bitmap_create.
Recent change to bitmap_create mishandles errors.
In particular a failure doesn't alway cause 'err' to be set.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-25 11:44:11 +11:00
NeilBrown
750f199ee8 md: mark some attributes as pre-alloc
Since __ATTR_PREALLOC was introduced in v3.19-rc1~78^2~18
it can now be used by md.

This ensure that writing to these sysfs attributes will never
block due to a memory allocation.
Such blocking could become a deadlock if mdmon is trying to
reconfigure an array after a failure prior to re-enabling writes.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-25 11:38:46 +11:00
Goldwyn Rodrigues
1aee41f637 Add new disk to clustered array
Algorithm:
1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
   ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD)
2. Node 1 sends NEWDISK with uuid and slot number
3. Other nodes issue kobject_uevent_env with uuid and slot number
(Steps 4,5 could be a udev rule)
4. In userspace, the node searches for the disk, perhaps
   using blkid -t SUB_UUID=""
5. Other nodes issue either of the following depending on whether the disk
   was found:
   ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and
	 disc.number set to slot number)
   ioctl(CLUSTERED_DISK_NACK)
6. Other nodes drop lock on no-new-devs (CR) if device is found
7. Node 1 attempts EX lock on no-new-devs
8. If node 1 gets the lock, it sends METADATA_UPDATED after unmarking the disk
   as SpareLocal
9. If not (get no-new-dev lock), it fails the operation and sends METADATA_UPDATED
10. Other nodes understand if the device is added or not by reading the superblock again after receiving the METADATA_UPDATED message.

Signed-off-by: Lidong Zhong <lzhong@suse.com>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 09:59:07 -06:00
Goldwyn Rodrigues
589a1c4916 Suspend writes in RAID1 if within range
If there is a resync going on, all nodes must suspend writes to the
range. This is recorded in the suspend_info/suspend_list.

If there is an I/O within the ranges of any of the suspend_info,
should_suspend will return 1.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 09:59:07 -06:00
Goldwyn Rodrigues
965400eb61 Send RESYNCING while performing resync start/stop
When a resync is initiated, RESYNCING message is sent to all active
nodes with the range (lo,hi). When the resync is over, a RESYNCING
message is sent with (0,0). A high sector value of zero indicates
that the resync is over.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 09:59:06 -06:00
Goldwyn Rodrigues
1d7e3e9611 Reload superblock if METADATA_UPDATED is received
Re-reads the devices by invalidating the cache.
Since we don't write to faulty devices, this is detected using
events recorded in the devices. If it is old as compared to the mddev
mark it is faulty.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 09:59:06 -06:00
Goldwyn Rodrigues
293467aa1f metadata_update sends message to other nodes
- request to send a message
   - make changes to superblock
   - send messages telling everyone that the superblock has changed
   - other nodes all read the superblock
   - other nodes all ack the messages
   - updating node release the "I'm sending a message" resource.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 09:59:06 -06:00
Goldwyn Rodrigues
f9209a3235 bitmap_create returns bitmap pointer
This is done to have multiple bitmaps open at the same time.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 09:57:57 -06:00
Goldwyn Rodrigues
96ae923ab6 Gather on-going resync information of other nodes
When a node joins, it does not know of other nodes performing resync.
So, each node keeps the resync information in it's LVB. When a new
node joins, it reads the LVB of each "online" bitmap.

[TODO] The new node attempts to get the PW lock on other bitmap, if
it is successful, it reads the bitmap and performs the resync (if
required) on it's behalf.

If the node does not get the PW, it requests CR and reads the LVB
for the resync information.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 07:30:11 -06:00
Goldwyn Rodrigues
cf921cc19c Add node recovery callbacks
DLM offers callbacks when a node fails and the lock remastery
is performed:

1. recover_prep: called when DLM discovers a node is down
2. recover_slot: called when DLM identifies the node and recovery
		can start
3. recover_done: called when all nodes have completed recover_slot

recover_slot() and recover_done() are also called when the node joins
initially in order to inform the node with its slot number. These slot
numbers start from one, so we deduct one to make it start with zero
which the cluster-md code uses.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 07:30:11 -06:00
Goldwyn Rodrigues
ca8895d9bb Return MD_SB_CLUSTERED if mddev is clustered
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 07:28:43 -06:00
Goldwyn Rodrigues
c4ce867fda Introduce md_cluster_info
md_cluster_info stores the cluster information in the MD device.

The join() is called when mddev detects it is a clustered device.
The main responsibilities are:
	1. Setup a DLM lockspace
	2. Setup all initial locks such as super block locks and bitmap lock (will come later)

The leave() clears up the lockspace and all the locks held.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 07:28:42 -06:00
Goldwyn Rodrigues
edb39c9ded Introduce md_cluster_operations to handle cluster functions
This allows dynamic registering of cluster hooks.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 07:28:42 -06:00
NeilBrown
6791875e2e md: make reconfig_mutex optional for writes to md sysfs files.
Rather than using mddev_lock() to take the reconfig_mutex
when writing to any md sysfs file, we only take mddev_lock()
in the particular _store() functions that require it.
Admittedly this is most, but it isn't all.

This also allows us to remove special-case handling for new_dev_store
(in md_attr_store).

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-06 09:32:56 +11:00
NeilBrown
5c47daf6e7 md: move mddev_lock and related to md.h
The one which is not inline (mddev_unlock) gets EXPORTed.

This makes the locking available to personality modules so that it
doesn't have to be imposed upon them.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-06 09:32:56 +11:00
NeilBrown
23da422b19 md: use mddev->lock to protect updates to resync_{min,max}.
There are interdependencies between these two sysfs attributes
and whether a resync is currently running.

Rather than depending on reconfig_mutex to ensure no races when
testing these interdependencies are met, use the spinlock.
This will allow the mutex to be remove from protecting this
code in a subsequent patch.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-06 09:32:56 +11:00
NeilBrown
1b30e66f5a md: minor cleanup in safe_delay_store.
There isn't really much room for races with ->safemode_delay.
But as I am trying to clean up any racy code and will soon
be removing reconfig_mutex protection from most _store()
functions:
 - only set mddev->safemode_delay once, to ensure no code
   can see an intermediate value
 - use safemode_timer to call md_safemode_timeout() rather than
   calling it directly, to ensure it never races with itself.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-06 09:32:56 +11:00
NeilBrown
4af1a04176 md: move GET_BITMAP_FILE ioctl out from mddev_lock.
It makes more sense to report bitmap_info->file, rather than
bitmap->file (the later is only available once the array is
active).

With that change, use mddev->lock to protect bitmap_info being
set to NULL, and we can call get_bitmap_file() without taking
the mutex.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-06 09:32:56 +11:00
NeilBrown
1e594bb24d md: tidy up set_bitmap_file
1/ delay setting mddev->bitmap_info.file until 'f' looks
   usable, so we don't have to unset it.
2/ Don't allow bitmap file to be set if bitmap_info.file
   is already set.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-06 09:32:56 +11:00
NeilBrown
f4ad3d38d4 md: remove unnecessary 'buf' from get_bitmap_file.
'buf' is only used because d_path fills from the end of the
buffer instead of from the start.
We don't need a separate buf to handle that, we just need to use
memmove() to move the string to the start.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-06 09:32:56 +11:00
NeilBrown
758bfc8abf md: remove mddev_lock from rdev_attr_show()
No rdev attributes need locking for 'show', though
state_show() might benefit from ensuring it sees a
consistent set of flags.

None even use rdev->mddev, so testing for it isn't really
needed and it certainly doesn't need to be held constant.

So improve state_show() and remove the locking.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-06 09:32:56 +11:00
NeilBrown
b7b17c9b67 md: remove mddev_lock() from md_attr_show()
Most attributes can be read safely without any locking.
A race might lead to a slightly out-dated value, but nothing wrong.

We already have locking in some places where needed.
All that remains is can_clear_show(), behind_writes_used_show()
and action_show() which are easily fixed.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-06 09:32:55 +11:00
NeilBrown
f97fcad38f md: remove need for mddev_lock() in md_seq_show()
The only access in md_seq_show that could suffer from races
not protected by ->lock is walking the rdev list.
This can receive sufficient protection from 'rcu'.

So use rdev_for_each_rcu() and get rid of mddev_lock().

Now reading /proc/mdstat will never block in md_seq_show.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-06 09:32:55 +11:00
NeilBrown
36d091f475 md: protect ->pers changes with mddev->lock
->pers is already protected by ->reconfig_mutex, and
cannot possibly change when there are threads running or
outstanding IO.

However there are some places where we access ->pers
not in a thread or IO context, and where ->reconfig_mutex
is unnecessarily heavy-weight:  level_show and md_seq_show().

So protect all changes, and those accesses, with ->lock.
This is a step toward taking those accesses out from under
reconfig_mutex.

[Fixed missing "mddev->pers" -> "pers" conversion, thanks to
 Dan Carpenter <dan.carpenter@oracle.com>]

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-04 08:35:53 +11:00
NeilBrown
db721d32b7 md: level_store: group all important changes into one place.
Gather all the changes that can happen atomically and might
be relevant to other code into one place.  This will
make it easier to refine the locking.

Note that this puts quite a few things between mddev_detach()
and ->free().  Enabling this was the point of some recent patches.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-04 08:35:53 +11:00
NeilBrown
afa0f557cb md: rename ->stop to ->free
Now that the ->stop function only frees the private data,
rename is accordingly.

Also pass in the private pointer as an arg rather than using
mddev->private.  This flexibility will be useful in level_store().

Finally, don't clear ->private.  It doesn't make sense to clear
it seeing that isn't what we free, and it is no longer necessary
to clear ->private (it was some time ago before  ->to_remove was
introduced).

Setting ->to_remove in ->free() is a bit of a wart, but not a
big problem at the moment.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-04 08:35:52 +11:00
NeilBrown
5aa61f427e md: split detach operation out from ->stop.
Each md personality has a 'stop' operation which does two
things:
 1/ it finalizes some aspects of the array to ensure nothing
    is accessing the ->private data
 2/ it frees the ->private data.

All the steps in '1' can apply to all arrays and so can be
performed in common code.

This is useful as in the case where we change the personality which
manages an array (in level_store()), it would be helpful to do
step 1 early, and step 2 later.

So split the 'step 1' functionality out into a new mddev_detach().

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-04 08:35:52 +11:00
NeilBrown
64590f45dd md: make merge_bvec_fn more robust in face of personality changes.
There is no locking around calls to merge_bvec_fn(), so
it is possible that calls which coincide with a level (or personality)
change could go wrong.

So create a central dispatch point for these functions and use
rcu_read_lock().
If the array is suspended, reject any merge that can be rejected.
If not, we know it is safe to call the function.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-04 08:35:52 +11:00
NeilBrown
5c675f83c6 md: make ->congested robust against personality changes.
There is currently no locking around calls to the 'congested'
bdi function.  If called at an awkward time while an array is
being converted from one level (or personality) to another, there
is a tiny chance of running code in an unreferenced module etc.

So add a 'congested' function to the md_personality operations
structure, and call it with appropriate locking from a central
'mddev_congested'.

When the array personality is changing the array will be 'suspended'
so no IO is processed.
If mddev_congested detects this, it simply reports that the
array is congested, which is a safe guess.
As mddev_suspend calls synchronize_rcu(), mddev_congested can
avoid races by included the whole call inside an rcu_read_lock()
region.
This require that the congested functions for all subordinate devices
can be run under rcu_lock.  Fortunately this is the case.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-04 08:35:52 +11:00
NeilBrown
85572d7c75 md: rename mddev->write_lock to mddev->lock
This lock is used for (slightly) more than helping with writing
superblocks, and it will soon be extended further.  So the
name is inappropriate.

Also, the _irq variant hasn't been needed since 2.6.37 as it is
never taking from interrupt or bh context.

So:
  -rename write_lock to lock
  -document what it protects
  -remove _irq ... except in md_flush_request() as there
     is no wait_event_lock() (with no _irq).  This can be
     cleaned up after appropriate changes to wait.h.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-04 08:35:52 +11:00
Linus Torvalds
8fd9589ced Three fixes for md
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIVAwUAVIzozznsnt1WYoG5AQKEOQ/9G31KYGhe3/Tn+Hhyy7o16+6fjOBSUN2z
 6GCGtOuHhom0Q+y4cdeAohw+eW9bJPqc9tRyRrLnVCuAkeymPP1t7fIRLGAuOHq5
 QjjX7ZwKK8urmPAFIxRFgJwrJctFEoaCQiZHk4Pcx9YcYm4paJP2liqqQ0khGPcl
 KGGjWgD3KMAG0moJzNmLMWBZevG3SLCTirPxOcDJuBkQ7KfP0e4UGVn/UYXdgF+f
 73yForADVIE/Jq/CXnsOEZa/nPTM7DIsXZAQqqFCnkUiuCvwhgfkowxhwCExx2d6
 w3+I69anD1WII1Z27bNDBgnJGR5xvdiAl6NTJ+GY1uMCo3lnV9OYpQrJxAEJP4dC
 nXILm6SVm7WLfLvVY1lJm3wZndIyWNmTcDxODGyMZhAY4RmjDHG+AZ9tcnY1pM+9
 rt0jHbmaj/ZZ5jKbGxPKBlQ3/U7EP8d4oiruOaDWCGT4sN+e2dbZVLpaWoNaBFUx
 eyUft+DEvY82ryz2Ktn/Exsx1aWk2Cy3cMl0ELq2L/2WAcjhA6gsikNmKG7tfafn
 Bd6WIGzAygATaFs4hOtFANyXqWubWi8TgLUHXNYUkJ2JooaMuSIfWrpeMNJyHHU7
 5bWTaSZqZ86Ygb6upvJqaA3olePMo1RPBsOev46TnuFT9vtfpzPRy6FlvbXfx8sK
 M5YCqNpTDTs=
 =Ar2y
 -----END PGP SIGNATURE-----

Merge tag 'md/3.19' of git://neil.brown.name/md

Pull md updates from Neil Brown:
 "Three fixes for md.

   I did have a largish set of locking changes queued, but late testing
  showed they weren't quite as stable as I thought and while I fixed
  what I found, I decided it safer to delay them a release ...
  particularly as I'll be AFK for a few weeks.  So expect a larger batch
  next time :-)"

* tag 'md/3.19' of git://neil.brown.name/md:
  md: Check MD_RECOVERY_RUNNING as well as ->sync_thread.
  md: fix semicolon.cocci warnings
  md/raid5: fetch_block must fetch all the blocks handle_stripe_dirtying wants.
2014-12-14 12:13:05 -08:00
NeilBrown
f851b60db0 md: Check MD_RECOVERY_RUNNING as well as ->sync_thread.
A recent change to md started the ->sync_thread from a asynchronously
from a work_queue rather than synchronously.  This means that there
can be a small window between the time when MD_RECOVERY_RUNNING is set
and when ->sync_thread is set.

So code that checks ->sync_thread might now conclude that the thread
has not been started and (because a lock is held) will not be started.
That is no longer the case.

Most of those places are best fixed by testing MD_RECOVERY_RUNNING
as well.  To make this completely reliable, we wake_up(&resync_wait)
after clearing that flag as well as after clearing ->sync_thread.

Other places are better served by flushing the relevant workqueue
to ensure that that if the sync thread was starting, it has now
started.  This is particularly best if we are about to stop the
sync thread.

Fixes: ac05f25669
Signed-off-by: NeilBrown <neilb@suse.de>
2014-12-11 10:02:10 +11:00
kbuild test robot
7d7e64f2ec md: fix semicolon.cocci warnings
drivers/md/md.c:7175:43-44: Unneeded semicolon

 Removes unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-12-03 16:07:59 +11:00
Gu Zheng
18c0b223cf md: use generic io stats accounting functions to simplify io stat accounting
Use generic io stats accounting help functions (generic_{start,end}_io_acct)
to simplify io stat accounting.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-24 08:05:16 -07:00
NeilBrown
45eaf45dfa md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN
md_check_recovery will skip any recovery and also clear
MD_RECOVERY_NEEDED if MD_RECOVERY_FROZEN is set.
So when we clear _FROZEN, we must set _NEEDED and ensure that
md_check_recovery gets run.
Otherwise we could miss out on something that is needed.

In particular, this can make it impossible to remove a
failed device from an array is the  'recovery-needed' processing
didn't happen.
Suitable for stable kernels since 3.13.

Cc: stable@vger.kernel.org (3.13+)
Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Fixes: 30b8feb730
Signed-off-by: NeilBrown <neilb@suse.de>
2014-11-17 09:17:46 +11:00
NeilBrown
6c144d3164 md: move EXPORT_SYMBOL to after function in md.c
Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:29 +11:00
NeilBrown
2cbbca5e7c md: discard PRINT_RAID_DEBUG ioctl
All the interesting information printed by this ioctl
is provided in /proc/mdstat and/or sysfs.
So it isn't needed and isn't used and would be best if it didn't
exist.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:29 +11:00
NeilBrown
403df47888 md: remove MD_BUG()
Most of the places that call this are doing so pointlessly.
A couple of the others a best replaced with WARN_ON().

Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:29 +11:00
NeilBrown
3adc28d85f md: clean up 'exit' labels in md_ioctl().
There are 4 labels and we only really need two.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:29 +11:00
NeilBrown
326eb17d73 md: remove unnecessary test for MD_MAJOR in md_ioctl()
unknown ioctls no longer get this deep into md_ioctl since
md_ioctl_valid() was introduced in 3.14.
So remove the test and the misleading comment.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:29 +11:00
NeilBrown
e1960f8c5c md: don't allow "-sync" to be set for device in an active array.
If an array is active, devices can be marked 'faulty', but simply
removing the 'sync' flag is wrong.  That only makes sense
for an array which is not active (and is probably only useful
for testing anyway).

Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:29 +11:00
NeilBrown
f72ffdd686 md: remove unwanted white space from md.c
My editor shows much of this is RED.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:29 +11:00
NeilBrown
ac05f25669 md: don't start resync thread directly from md thread.
The main 'md' thread is needed for processing writes, so if it blocks
write requests could be delayed.

Starting a new thread requires some GFP_KERNEL allocations and so can
wait for writes to complete.  This can deadlock.

So instead, ask a workqueue to start the sync thread.
There is no particular rush for this to happen, so any work queue
will do.

MD_RECOVERY_RUNNING is used to ensure only one thread is started.

Reported-by: BillStuff <billstuff2001@sbcglobal.net>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:28 +11:00
NeilBrown
8b1afc3d67 md: Just use RCU when checking for overlap between arrays.
We don't really need the full mddev_lock here, and having to
drop it is messy.
RCU is enough to protect these lists.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:28 +11:00
Chao Yu
50bd377405 md: avoid potential long delay under pers_lock
printk may cause long time lapse if value of printk_delay in sysctl is
configured large by user. If register_md_personality takes long time to print in
spinlock pers_lock, we may encounter high CPU usage rate when there are other
pers_lock competitors who may be blocked to spin.
We can avoid this condition by moving printk out of coverage of pers_lock
spinlock.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:28 +11:00
NeilBrown
0638bb0e73 md: simplify export_array()
We don't really need that for_each loop, or those MD_BUGs.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:28 +11:00
NeilBrown
4878e9eb88 md: discard find_rdev_nr in favour of find_rdev_nr_rcu
Having both is a waste - just use the one.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:28 +11:00
NeilBrown
1967cd5616 md: use wait_event() to simplify md_super_wait()
md_super_wait is really just wait_event() open-coded.
So use the macro instead.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:28 +11:00
NeilBrown
9ba3b7f5d0 md: be more relaxed about stopping an array which isn't started.
In general we don't allow an array to be stopped if it is in use.
However if the array hasn't really been started yet, then any
apparent use is an anomily, probably due to 'udev' or similar
having a look to see what is there.

This means that if something goes wrong while assembling an array
it cannot reliably be un-assembled - STOP_ARRAY could fail.
There is no value here, so change do_md_stop() to succeed
despite concurrent opens if the array has not yet been
activated.  i.e. if ->pers is NULL.

Reported-by: "Baldysiak, Pawel" <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-10-14 13:08:28 +11:00
NeilBrown
d66b1b395a md: don't allow bitmap file to be added to raid0/linear.
An array can only accept a bitmap if it will call bitmap_daemon_work
periodically, which means it needs a thread running.

If there is no thread, don't allow a bitmap to be added.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-08-08 15:43:20 +10:00
Xiao Ni
ac7e50a383 md: Recovery speed is wrong
When we calculate the speed of recovery, the numerator that contains
the recovery done sectors.  It's need to subtract the sectors which
don't finish recovery.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-08-08 12:11:25 +10:00
NeilBrown
af5628f05d md: disable probing for md devices 512 and over.
The way md devices are traditionally created in the kernel
is simply to open the device with the desired major/minor number.

This can be problematic as some support tools, notably udev and
programs run by udev, can open a device just to see what is there, and
find that it has created something.  It is easy for a race to cause
udev to open an md device just after it was destroy, causing it to
suddenly re-appear.

For some time we have had an alternate way to create md devices
  echo md_somename > /sys/modules/md_mod/paramaters/new_array

This will always use a minor number of 512 or higher, which mdadm
normally avoids.
Using this makes the creation-by-opening unnecessary, but does
not disable it, so it is still there to cause problems.

This patch disable probing for devices with a major of 9 (MD_MAJOR)
and a minor of 512 and up.  This devices created by writing to
new_array cannot be re-created by opening the node in /dev.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-07-31 13:54:54 +10:00
NeilBrown
133d4527ea md: flush writes before starting a recovery.
When we write to a degraded array which has a bitmap, we
make sure the relevant bit in the bitmap remains set when
the write completes (so a 're-add' can quickly rebuilt a
temporarily-missing device).

If, immediately after such a write starts, we incorporate a spare,
commence recovery, and skip over the region where the write is
happening (because the 'needs recovery' flag isn't set yet),
then that write will not get to the new device.

Once the recovery finishes the new device will be trusted, but will
have incorrect data, leading to possible corruption.

We cannot set the 'needs recovery' flag when we start the write as we
do not know easily if the write will be "degraded" or not.  That
depends on details of the particular raid level and particular write
request.

This patch fixes a corruption issue of long standing and so it
suitable for any -stable kernel.  It applied correctly to 3.0 at
least and will minor editing to earlier kernels.

Reported-by: Bill <billstuff2001@sbcglobal.net>
Tested-by: Bill <billstuff2001@sbcglobal.net>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/53A518BB.60709@sbcglobal.net
Signed-off-by: NeilBrown <neilb@suse.de>
2014-07-03 10:44:45 +10:00
NeilBrown
9bd3592032 md: make sure GET_ARRAY_INFO ioctl reports correct "clean" status
If an array has a bitmap, the when we set the "has bitmap" flag we
incorrectly clear the "is clean" flag.

"is clean" isn't really important when a bitmap is present, but it is
best to get it right anyway.

Reported-by: George Duffield <forumscollective@gmail.com>
Link: http://lkml.kernel.org/CAG__1a4MRV6gJL38XLAurtoSiD3rLBTmWpcS5HYvPpSfPR88UQ@mail.gmail.com
Fixes: 36fa30636f (v2.6.14)
Signed-off-by: NeilBrown <neilb@suse.de>
2014-07-03 10:44:31 +10:00
NeilBrown
8b32bf5e37 md: md_clear_badblocks should return an error code on failure.
Julia Lawall and coccinelle report that md_clear_badblocks always
returns 0, despite appearing to have an error path.
The error path really should return an error code.  ENOSPC is
reasonably appropriate.

Reported-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-29 16:59:46 +10:00
NeilBrown
bd8839e03b md: refuse to change shape of array if it is active but read-only
read-only arrays should not be changed.  This includes changing
the level, layout, size, or number of devices.

So reject those changes for readonly arrays.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-29 16:59:30 +10:00
NeilBrown
2ac295a544 md: always set MD_RECOVERY_INTR when interrupting a reshape thread.
Commit 8313b8e57f
   md: fix problem when adding device to read-only array with bitmap.

added a called to md_reap_sync_thread() which cause a reshape thread
to be interrupted (in particular, it could cause md_thread() to never even
call md_do_sync()).
However it didn't set MD_RECOVERY_INTR so ->finish_reshape() would not
know that the reshape didn't complete.

This only happens when mddev->ro is set and normally reshape threads
don't run in that situation.  But raid5 and raid10 can start a reshape
thread during "run" is the array is in the middle of a reshape.
They do this even if ->ro is set.

So it is best to set MD_RECOVERY_INTR before abortingg the
sync thread, just in case.

Though it rare for this to trigger a problem it can cause data corruption
because the reshape isn't finished properly.
So it is suitable for any stable which the offending commit was applied to.
(3.2 or later)

Fixes: 8313b8e57f
Cc: stable@vger.kernel.org (3.2+)
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-29 16:59:06 +10:00
NeilBrown
3991b31ea0 md: always set MD_RECOVERY_INTR when aborting a reshape or other "resync".
If mddev->ro is set, md_to_sync will (correctly) abort.
However in that case MD_RECOVERY_INTR isn't set.

If a RESHAPE had been requested, then ->finish_reshape() will be
called and it will think the reshape was successful even though
nothing happened.

Normally a resync will not be requested if ->ro is set, but if an
array is stopped while a reshape is on-going, then when the array is
started, the reshape will be restarted.  If the array is also set
read-only at this point, the reshape will instantly appear to success,
resulting in data corruption.

Consequently, this patch is suitable for any -stable kernel.

Cc: stable@vger.kernel.org (any)
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-28 13:39:39 +10:00
NeilBrown
0f62fb220a md: avoid possible spinning md thread at shutdown.
If an md array with externally managed metadata (e.g. DDF or IMSM)
is in use, then we should not set safemode==2 at shutdown because:

1/ this is ineffective: user-space need to be involved in any 'safemode' handling,
2/ The safemode management code doesn't cope with safemode==2 on external metadata
   and md_check_recover enters an infinite loop.

Even at shutdown, an infinite-looping process can be problematic, so this
could cause shutdown to hang.

Cc: stable@vger.kernel.org (any kernel)
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-06 09:49:31 +10:00
NeilBrown
e2f23b606b md: avoid oops on unload if some process is in poll or select.
If md-mod is unloaded while some process is in poll() or select(),
then that process maintains a pointer to md_event_waiters, and when
the try to unlink from that list, they will oops.

The procfs infrastructure ensures that ->poll won't be called after
remove_proc_entry, but doesn't provide a wait_queue_head for us to
use, and the waitqueue code doesn't provide a way to remove all
listeners from a waitqueue.

So we need to:
 1/ make sure no further references to md_event_waiters are taken (by
    setting md_unloading)
 2/ wake up all processes currently waiting, and
 3/ wait until all those processes have disconnected from our
    wait_queue_head.

Reported-by: "majianpeng" <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-04-09 14:42:34 +10:00
NeilBrown
035328c202 md/bitmap: don't abuse i_writecount for bitmap files.
md bitmap code currently tries to use i_writecount to stop any other
process from writing to out bitmap file.  But that is really an abuse
and has bit-rotted so locking is all wrong.

So discard that - root should be allowed to shoot self in foot.

Still use it in a much less intrusive way to stop the same file being
used as bitmap on two different array, and apply other checks to
ensure the file is at least vaguely usable for bitmap storage
(is regular, is open for write.  Support for ->bmap is already checked
elsewhere).

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-04-09 12:26:59 +10:00
Linus Torvalds
f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Linus Torvalds
5c85121bf6 md updates for 3.14
All bug fixes, two tagged for -stable.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIVAwUAUuG8WTnsnt1WYoG5AQKW7A//V93TYUiUAG6zNUFrNZjuoXP0ym3jlpkH
 eIIFdcV7rr0Irgtd8+s9cW8Cjsbq3d/vMbFlwP1Co32mnCnFFojeKtCvM9GqkYrH
 o4Zr1nVAVYzKO4awByK3wBT9WbzEc/XlDgYQIpZExIYeZdzLOm6HyvlRbcE86Ug5
 QoGYOUlLu4LUZmFgB9zQ7JM0GACV5pS1afSObtACj2t2x5GVHNU84u+M+D8urPXO
 wnf+AIAzquh5F+8MX+DxmMEUaSzUHf8fXOM3jYVbzPI71SpaHssL4SwBn+j4I/8/
 SCSqeIh7qMSuqUy63/iHKCy5qAgNuRdL9fYlOTkpxzHm81Ddj8u7fySsApggVOa2
 yeKTkSRlsMFeu+LiGKNi/fINVxboaoYJVZ2DTNtKxSuW2VL2aPNz1Qjq4QnR3nSI
 LpaB3VeVKdMsH8Em1a8cgZWcjo5YFAcNtUnJq2fvj9VZ3SJNw4ZoKDL+l718iGao
 xIwAXMSafAHQVAAaNVFkwrea13TeOyxikY5Ra4vWfm+Fw8TzmYq5DqO0zaILwdAJ
 2FnNj2/2y3hk2K7qBcEvjjEakxPlTwzrzxZMfJDRMuQLqvrjbXiMGOnWzgl1D/9x
 4/uPjeFZLG7byxmIyg4Y83NkPgkWnRPpGK98r26pUH1UgnRF0a5aUFXQk7rsQrU6
 noRkZ9EPD8s=
 =d81E
 -----END PGP SIGNATURE-----

Merge tag 'md/3.14' of git://neil.brown.name/md

Pull md updates from Neil Brown:
 "All bug fixes, two tagged for -stable"

* tag 'md/3.14' of git://neil.brown.name/md:
  md/raid5: close recently introduced race in stripe_head management.
  md/raid5: fix long-standing problem with bitmap handling on write failure.
  md: check command validity early in md_ioctl().
  md: ensure metadata is writen after raid level change.
  md/raid10: avoid fullsync when not necessary.
  md: allow a partially recovered device to be hot-added to an array.
  md: Change handling of save_raid_disk and metadata update during recovery.
2014-01-24 17:41:50 -08:00
Nicolas Schichan
cb335f88eb md: check command validity early in md_ioctl().
Verify that the cmd parameter passed to md_ioctl() is valid before
doing anything.

This fixes mddev->hold_active being set to 0 when an invalid ioctl
command is passed to md_ioctl() before the array has been configured.

Clearing mddev->hold_active in that case can lead to a livelock
situation when an invalid ioctl number is given to md_ioctl() by a
process when the mddev is currently being opened by another process:

Process 1				Process 2
---------				---------

md_alloc()
  mddev_find()
  -> returns a new mddev with
     hold_active == UNTIL_IOCTL
  add_disk()
  -> sends KOBJ_ADD uevent

					(sees KOBJ_ADD uevent for device)
                    			md_open()
                    			md_ioctl(INVALID_IOCTL)
                    			-> returns ENODEV and clears
                       			   mddev->hold_active
                    			md_release()
                      			md_put()
                      			-> deletes the mddev as
                         		   hold_active is 0

md_open()
  mddev_find()
  -> returns a newly
    allocated mddev with
    mddev->gendisk == NULL
-> returns with ERESTARTSYS
   (kernel restarts the open syscall)

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-16 08:55:00 +11:00
Linus Torvalds
1a60864fc1 md: half a dozen bug fixes for 3.13
All of these fix real bugs the people have hit, and are tagged
 for -stable.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIVAwUAUtYZqznsnt1WYoG5AQK50g//XuqVR/esIpGR+knf+1sD3Zk85Vp33kGL
 2UfbQbi40q/uLjBhJhOSkx/sYGw1Eo255vNX+yMVjYT9F+xbhI8vlLfecqx5Fk5J
 M+JH1sM7E2T79boFLoOBGSl/qppsQsPHa3p87FmFHQrrAuEMIbFiP98MnQjdSiv4
 Cu9cAR7x7njepHeMXBFiV7URaYtCHAXR9iMdkebkKIFlfND8w2QYD+LWo3SzBKs9
 jTrSBJRpXLHE+bZLOQPhAryb7nWkcT1R7N0vsVMQKcq1o6ZiRNnk/B9xNtV34hkc
 5zwTPe/d5AsV6Tsxg0dSs7xcBn/A+F5lg8fzdOhyE1F13COmB7sepjPTMPAy/oP1
 zjyPwnnWkHMDUW2usf3aqPMt+LGMofRCJHXjkqpMgIWQ96SQUY8F9PPxchkUCsx/
 A38I+vXl2jGDHh/DFSduef3sDOF6TYyKyLteJftyny96dc1RutrZSbHPdrkDz1YQ
 6zcyvpv0FexiXITrLg70FG8fnRMK91ZfHrmuzVP7tpm2TyeIfDriLhTAIXAcXHOT
 l22a1bNj4shFfztnD0CbH6nY/iJM7ov0x5+IyG5/iYbipon02MenQeV9km6JVwQb
 OCGHYCTswiFSduX1E1ru52dHXifbANWgzcUH0sjGQ0YZNmxvPRBWDjB1H2J1auzW
 J8T10qimw1w=
 =uvyl
 -----END PGP SIGNATURE-----

Merge tag 'md/3.13-fixes' of git://neil.brown.name/md

Pull late md fixes from Neil Brown:
 "Half a dozen md bug fixes.

  All of these fix real bugs the people have hit, and are tagged for
  -stable.  Sorry they are late ....  Christmas holidays and all that.
  Hopefully they can still squeak into 3.13"

* tag 'md/3.13-fixes' of git://neil.brown.name/md:
  md: fix problem when adding device to read-only array with bitmap.
  md/raid10: fix bug when raid10 recovery fails to recover a block.
  md/raid5: fix a recently broken BUG_ON().
  md/raid1: fix request counting bug in new 'barrier' code.
  md/raid10: fix two bugs in handling of known-bad-blocks.
  md/raid5: Fix possible confusion when multiple write errors occur.
2014-01-15 15:07:36 +07:00
NeilBrown
830778a180 md: ensure metadata is writen after raid level change.
level_store() currently does not make sure the metadata is
updates to reflect the new raid level.  It simply sets MD_CHANGE_DEVS.

Any level with a ->thread will quickly notice this and update the
metadata.  However RAID0 and Linear do not have a thread so no
metadata update happens until the array is stopped.  At that point the
metadata is written.

This is later that we would like.  While the delay doesn't risk any
data it can cause confusion.  So if there is no md thread, immediately
update the metadata after a level change.

Reported-by: Richard Michael <rmichael@edgeofthenet.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:21 +11:00
NeilBrown
7eb418851f md: allow a partially recovered device to be hot-added to an array.
When adding a new device into an array it is normally important to
clear any stale data from ->recovery_offset else the new device may
not be recovered properly.

However when re-adding a device which is known to be nearly in-sync,
this is not needed and can be detrimental.  The (bitmap-based)
resync will still happen, and further recovery is only needed from
where-ever it was already up to.

So if save_raid_disk is set, signifying a re-add, don't clear
->recovery_offset.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:21 +11:00
NeilBrown
f466722ca6 md: Change handling of save_raid_disk and metadata update during recovery.
Since commit d70ed2e4fa
   MD: Allow restarting an interrupted incremental recovery.

we don't write out the metadata to devices while they are recovering.
This had a good reason, but has unfortunate consequences.  This patch
changes things to make them work better.

At issue is what happens if the array is shut down while a recovery is
happening, particularly a bitmap-guided recovery.
Ideally the recovery should pick up where it left off.
However the metadata cannot represent the state "A recovery is in
process which is guided by the bitmap".

Before the above mentioned commit, we wrote metadata to the device
which said "this is being recovered and it is up to <here>".  So after
a restart, a full recovery (not bitmap-guided) would happen from
where-ever it was up to.

After the commit the metadata wasn't updated so it still said "This
device is fully in sync with <this> event count".  That leads to a
bitmap-based recovery following the whole bitmap, which should be a
lot less work than a full recovery from some starting point.  So this
was an improvement.

However updates some metadata but not all leads to other problems.
In particular, the metadata written to the fully-up-to-date device
record that the array has all devices present (even though some are
recovering).  So on restart, mdadm wants to find all devices and
expects them to have current event counts.
Obviously it doesn't (some have old event counts) so (when assembling
with --incremental) it waits indefinitely for the rest of the expected
devices.

It really is wrong to not update all the metadata together.  Do that
is bound to cause confusion.
Instead, we should make it possible to record the truth in the
metadata.  i.e. we need to be able to record that a device is being
recovered based on the bitmap.
We already have a Feature flag to say that recovery is happening.  We
now add another one to say that it is a bitmap-based recovery.

With this we can remove the code that disables the write-out of
metadata on some devices.

So this patch:
 - moves the setting of 'saved_raid_disk' from add_new_disk to
   the validate_super methods.  This makes sure it is always set
   properly, both when adding a new device to an array, and when
   assembling an array from a collection of devices.
 - Adds a metadata flag MD_FEATURE_RECOVERY_BITMAP which is only
   used if MD_FEATURE_RECOVERY_OFFSET is set, and record that a
   bitmap-based recovery is allowed.
   This is only present in v1.x metadata. v0.90 doesn't support
   devices which are in the middle of recovery at all.
 - Only skips writing metadata to Faulty devices.

 - Also allows rdev state to be set to "-insync" via sysfs.
   This can be used for external-metadata arrays.  When the
   'role' is set the device is assumed to be in-sync.  If, after
   setting the role, we set the state to "-insync", the role is
   moved to saved_raid_disk which effectively says the device is
   partly in-sync with that slot and needs a bitmap recovery.

Cc: Andrei Warkentin <andreiw@vmware.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:21 +11:00
NeilBrown
8313b8e57f md: fix problem when adding device to read-only array with bitmap.
If an array is started degraded, and then the missing device
is found it can be re-added and a minimal bitmap-based recovery
will bring it fully up-to-date.

If the array is read-only a recovery would not be allowed.
But also if the array is read-only and the missing device was
present very recently, then there could be no need for any
recovery at all, so we simply include the device in the read-only
array without any recovery.

However... if the missing device was removed a little longer ago
it could be missing some updates, but if a bitmap is present it will
be conditionally accepted pending a bitmap-based update.  We don't
currently detect this case properly and will include that old
device into the read-only array with no recovery even though it really
needs a recovery.

This patch keeps track of whether a bitmap-based-recovery is really
needed or not in the new Bitmap_sync rdev flag.  If that is set,
then the device will not be added to a read-only array.

Cc: Andrei Warkentin <andreiw@vmware.com>
Fixes: d70ed2e4fa
Cc: stable@vger.kernel.org (3.2+)
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:08 +11:00
Jens Axboe
b28bc9b38c Linux 3.13-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJSwLfoAAoJEHm+PkMAQRiGi6QH/1U1B7lmHChDTw3jj1lfm9gA
 189Si4QJlnxFWCKHvKEL+pcaVuACU+aMGI8+KyMYK4/JfuWVjjj5fr/SvyHH2/8m
 LdSK8aHMhJ46uBS4WJ/l6v46qQa5e2vn8RKSBAyKm/h4vpt+hd6zJdoFrFai4th7
 k/TAwOAEHI5uzexUChwLlUBRTvbq4U8QUvDu+DeifC8cT63CGaaJ4qVzjOZrx1an
 eP6UXZrKDASZs7RU950i7xnFVDQu4PsjlZi25udsbeiKcZJgPqGgXz5ULf8ZH8RQ
 YCi1JOnTJRGGjyIOyLj7pyB01h7XiSM2+eMQ0S7g54F2s7gCJ58c2UwQX45vRWU=
 =/4/R
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc6' into for-3.14/core

Needed to bring blk-mq uptodate, since changes have been going in
since for-3.14/core was established.

Fixup merge issues related to the immutable biovec changes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflicts:
	block/blk-flush.c
	fs/btrfs/check-integrity.c
	fs/btrfs/extent_io.c
	fs/btrfs/scrub.c
	fs/logfs/dev_bdev.c
2013-12-31 09:51:02 -07:00
Linus Torvalds
5ee540613d Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "A small collection of fixes for the current series. It contains:

   - A fix for a use-after-free of a request in blk-mq.  From Ming Lei

   - A fix for a blk-mq bug that could attempt to dereference a NULL rq
     if allocation failed

   - Two xen-blkfront small fixes

   - Cleanup of submit_bio_wait() type uses in the kernel, unifying
     that.  From Kent

   - A fix for 32-bit blkg_rwstat reading.  I apologize for this one
     looking mangled in the shortlog, it's entirely my fault for missing
     an empty line between the description and body of the text"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: fix use-after-free of request
  blk-mq: fix dereference of rq->mq_ctx if allocation fails
  block: xen-blkfront: Fix possible NULL ptr dereference
  xen-blkfront: Silence pfn maybe-uninitialized warning
  block: submit_bio_wait() conversions
  Update of blkg_stat and blkg_rwstat may happen in bh context
2013-12-05 15:33:27 -08:00
NeilBrown
142d44c310 md: test mddev->flags more safely in md_check_recovery.
commit 7a0a5355cb md: Don't test all of mddev->flags at once.
made most tests on mddev->flags safer, but missed one.

When
commit 260fa034ef md: avoid deadlock when dirty buffers during md_stop.
added MD_STILL_CLOSED, this caused md_check_recovery to misbehave.
It can think there is something to do but find nothing.  This can
lead to the md thread spinning during array shutdown.

https://bugzilla.kernel.org/show_bug.cgi?id=65721

Reported-and-tested-by: Richard W.M. Jones <rjones@redhat.com>
Fixes: 260fa034ef
Cc: stable@vger.kernel.org (3.12)
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-28 11:00:08 +11:00
Kent Overstreet
c170bbb45f block: submit_bio_wait() conversions
It was being open coded in a few places.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Chris Mason <chris.mason@fusionio.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-24 16:33:41 -07:00