linux/drivers/md
Mike Snitzer 6acfe68bac dm: fix excessive dm-mq context switching
Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
than if an underlying null_blk device were used directly.  One of the
reasons for this drop in performance is that blk_insert_clone_request()
was calling blk_mq_insert_request() with @async=true.  This forced the
use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
which ushered in ping-ponging between process context (fio in this case)
and kblockd's kworker to submit the cloned request.  The ftrace
function_graph tracer showed:

  kworker-2013  =>   fio-12190
  fio-12190    =>  kworker-2013
  ...
  kworker-2013  =>   fio-12190
  fio-12190    =>  kworker-2013
  ...

Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
_not_ use kblockd to submit the cloned requests isn't enough to
eliminate the observed context switches.

In addition to this dm-mq specific blk-core fix, there are 2 DM core
fixes to dm-mq that (when paired with the blk-core fix) completely
eliminate the observed context switching:

1)  don't blk_mq_run_hw_queues in blk-mq request completion

    Motivated by desire to reduce overhead of dm-mq, punting to kblockd
    just increases context switches.

    In my testing against a really fast null_blk device there was no benefit
    to running blk_mq_run_hw_queues() on completion (and no other blk-mq
    driver does this).  So hopefully this change doesn't induce the need for
    yet another revert like commit 621739b00e !

2)  use blk_mq_complete_request() in dm_complete_request()

    blk_complete_request() doesn't offer the traditional q->mq_ops vs
    .request_fn branching pattern that other historic block interfaces
    do (e.g. blk_get_request).  Using blk_mq_complete_request() for
    blk-mq requests is important for performance.  It should be noted
    that, like blk_complete_request(), blk_mq_complete_request() doesn't
    natively handle partial completions -- but the request-based
    DM-multipath target does provide the required partial completion
    support by dm.c:end_clone_bio() triggering requeueing of the request
    via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.

dm-mq fix #2 is _much_ more important than #1 for eliminating the
context switches.
Before: cpu          : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
After:  cpu          : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472

With these changes multithreaded async read IOPs improved from ~950K
to ~1350K for this dm-mq stacked on null_blk test-case.  The raw read
IOPs of the underlying null_blk device for the same workload is ~1950K.

Fixes: 7fb4898e0 ("block: add blk-mq support to blk_insert_cloned_request()")
Fixes: bfebd1cdb ("dm: add full blk-mq support to request-based DM")
Cc: stable@vger.kernel.org # 4.1+
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
2016-02-22 11:04:40 -05:00
..
bcache Merge branch 'for-4.5/drivers' of git://git.kernel.dk/linux-block 2016-01-21 18:19:38 -08:00
persistent-data dm space map metadata: remove unused variable in brb_pop() 2015-12-14 09:26:01 -05:00
bitmap.c md-cluster: delete useless code 2016-01-24 18:13:37 -08:00
bitmap.h md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
dm-bio-prison.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm-bio-prison.h dm bio prison: add dm_cell_promote_or_release() 2015-05-29 14:19:06 -04:00
dm-bio-record.h dm: Refactor for new bio cloning/splitting 2013-11-23 22:33:55 -08:00
dm-bufio.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-01-12 17:11:47 -08:00
dm-bufio.h dm snapshot: use dm-bufio prefetch 2014-01-14 23:23:03 -05:00
dm-builtin.c dm sysfs: fix a module unload race 2014-01-14 23:23:04 -05:00
dm-cache-block-types.h dm cache: revert "remove remainder of distinct discard block size" 2014-11-10 15:25:30 -05:00
dm-cache-metadata.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-cache-metadata.h dm cache: add fail io mode and needs_check flag 2015-06-11 17:13:00 -04:00
dm-cache-policy-cleaner.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-cache-policy-internal.h dm cache: age and write back cache entries even without active IO 2015-06-11 17:13:01 -04:00
dm-cache-policy-mq.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-cache-policy-smq.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-cache-policy.c dm cache: add policy name to status output 2014-01-16 13:44:11 -05:00
dm-cache-policy.h dm cache: age and write back cache entries even without active IO 2015-06-11 17:13:01 -04:00
dm-cache-target.c dm: don't save and restore bi_private 2015-12-10 10:38:56 -05:00
dm-crypt.c dm crypt: fix a possible hang due to race condition on exit 2015-11-19 13:38:30 -05:00
dm-delay.c dm delay: document that offsets are specified in sectors 2015-10-31 19:06:05 -04:00
dm-era-target.c dm persistent data: eliminate unnecessary return values 2015-10-31 19:06:02 -04:00
dm-exception-store.c - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
dm-exception-store.h dm snapshot: fix hung bios when copy error occurs 2016-01-08 20:03:05 -05:00
dm-flakey.c dm: refactor ioctl handling 2015-10-31 19:05:59 -04:00
dm-io.c md: more open-coded offset_in_page() 2016-01-04 10:29:12 -05:00
dm-ioctl.c char: make misc_deregister a void function 2015-08-05 10:35:49 -07:00
dm-kcopyd.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
dm-linear.c dm linear: remove redundant target name from error messages 2015-10-31 19:06:03 -04:00
dm-log-userspace-base.c dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() 2015-10-31 19:06:00 -04:00
dm-log-userspace-transfer.c dm log userspace transfer: match wait_for_completion_timeout return type 2015-04-15 12:10:20 -04:00
dm-log-userspace-transfer.h
dm-log-writes.c dm: refactor ioctl handling 2015-10-31 19:05:59 -04:00
dm-log.c
dm-mpath.c dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path 2015-11-17 14:19:00 -05:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid1.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-raid.c dm raid: fix round up of default region size 2015-10-02 12:02:31 -04:00
dm-region-hash.c dm: convert ffs to __ffs 2015-10-31 19:06:01 -04:00
dm-round-robin.c
dm-service-time.c
dm-snap-persistent.c dm snapshot: fix hung bios when copy error occurs 2016-01-08 20:03:05 -05:00
dm-snap-transient.c dm snapshot: fix hung bios when copy error occurs 2016-01-08 20:03:05 -05:00
dm-snap.c dm snapshot: fix hung bios when copy error occurs 2016-01-08 20:03:05 -05:00
dm-stats.c dm stats: report precise_timestamps and histogram in @stats_list output 2015-08-18 17:20:03 -04:00
dm-stats.h dm stats: support precise timestamps 2015-06-17 12:40:40 -04:00
dm-stripe.c Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm 2015-09-02 16:35:26 -07:00
dm-switch.c dm switch: simplify conditional in alloc_region_table() 2015-10-31 19:06:06 -04:00
dm-sysfs.c dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr 2015-04-15 12:10:17 -04:00
dm-table.c block: Inline blk_integrity in struct gendisk 2015-10-21 14:42:42 -06:00
dm-target.c dm: allocate requests in target when stacking on blk-mq devices 2015-02-09 13:06:47 -05:00
dm-thin-metadata.c dm thin metadata: make dm_thin_find_mapped_range() atomic 2015-12-10 10:38:55 -05:00
dm-thin-metadata.h dm thin metadata: add dm_thin_remove_range() 2015-06-11 17:13:04 -04:00
dm-thin.c dm thin: bump thin and thin-pool target versions 2016-01-06 20:59:40 -05:00
dm-uevent.c
dm-uevent.h
dm-verity-fec.c dm verity: add ignore_zero_blocks feature 2015-12-10 10:39:03 -05:00
dm-verity-fec.h dm verity: add support for forward error correction 2015-12-10 10:39:03 -05:00
dm-verity-target.c dm verity: add ignore_zero_blocks feature 2015-12-10 10:39:03 -05:00
dm-verity.h dm verity: add ignore_zero_blocks feature 2015-12-10 10:39:03 -05:00
dm-zero.c block: add a bi_error field to struct bio 2015-07-29 08:55:15 -06:00
dm.c dm: fix excessive dm-mq context switching 2016-02-22 11:04:40 -05:00
dm.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
faulty.c MD: rename some functions 2016-01-20 13:52:20 -08:00
Kconfig dm verity: add support for forward error correction 2015-12-10 10:39:03 -05:00
linear.c block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
linear.h
Makefile dm verity: add support for forward error correction 2015-12-10 10:39:03 -05:00
md-cluster.c md-cluster: fix missing memory free 2016-01-24 18:13:18 -08:00
md-cluster.h md-cluster: append some actions when change bitmap from clustered to none 2016-01-06 11:38:57 +11:00
md.c md updates for 4.5 2016-01-15 12:28:00 -08:00
md.h md updates for 4.5 2016-01-15 12:28:00 -08:00
multipath.c md/raid: only permit hot-add of compatible integrity profiles 2016-01-14 11:49:57 +11:00
multipath.h
raid0.c treewide: Fix typos in printk 2015-12-08 14:59:19 +01:00
raid0.h block: kill merge_bvec_fn() completely 2015-08-13 12:31:57 -06:00
raid1.c MD: rename some functions 2016-01-20 13:52:20 -08:00
raid1.h md-cluster: Use a small window for resync 2015-10-12 01:32:05 -05:00
raid5-cache.c raid5-cache: handle journal hotadd in quiesce 2016-01-14 11:49:43 +11:00
raid5.c MD: rename some functions 2016-01-20 13:52:20 -08:00
raid5.h raid5-cache: IO error handling 2015-11-01 13:48:29 +11:00
raid10.c MD: rename some functions 2016-01-20 13:52:20 -08:00
raid10.h md/raid10: ensure device failure recorded before write request returns. 2015-08-31 19:43:45 +02:00