Commit Graph

1072658 Commits

Author SHA1 Message Date
Colin Ian King
302f035141 dm cache policy smq: make static read-only array table const
The 'table' static array is read-only so it make sense to make
it const. Add in the int type to clean up checkpatch warning.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-22 10:35:53 -05:00
Mike Snitzer
c357342186 dm delay: use dm_submit_bio_remap
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:34 -05:00
Mike Snitzer
e5524e128f dm crypt: use dm_submit_bio_remap
Care was taken to support kcryptd_io_read being called from crypt_map
or workqueue.  Use of an intermediate CRYPT_MAP_READ_GFP gfp_t
(defined as GFP_NOWAIT) should protect from maintenance burden if that
flag were to change for some reason.

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:34 -05:00
Mike Snitzer
0fbb4d93b3 dm: add dm_submit_bio_remap interface
Where possible, switch from early bio-based IO accounting (at the time
DM clones each incoming bio) to late IO accounting just before each
remapped bio is issued to underlying device via submit_bio_noacct().

Allows more precise bio-based IO accounting for DM targets that use
their own workqueues to perform additional processing of each bio in
conjunction with their DM_MAPIO_SUBMITTED return from their map
function. When a target is updated to use dm_submit_bio_remap() they
must also set ti->accounts_remapped_io to true.

Use xchg() in start_io_acct(), as suggested by Mikulas, to ensure each
IO is only started once.  The xchg race only happens if
__send_duplicate_bios() sends multiple bios -- that case is reflected
via tio->is_duplicate_bio.  Given the niche nature of this race, it is
best to avoid any xchg performance penalty for normal IO.

For IO that was never submitted with dm_bio_submit_remap(), but the
target completes the clone with bio_endio, accounting is started then
ended and pending_io counter decremented.

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:33 -05:00
Mike Snitzer
e6fc9f62ce dm: flag clones created by __send_duplicate_bios
Formally disallow dm_accept_partial_bio() on clones created by
__send_duplicate_bios() because their len_ptr points to a shared
unsigned int.  __send_duplicate_bios() is only used for flush bios
and other "abnormal" bios (discards, writezeroes, etc). And
dm_accept_partial_bio() already didn't support flush bios.

Also refactor __send_changing_extent_only() to reflect it cannot fail.
As such __send_changing_extent_only() can update the clone_info before
__send_duplicate_bios() is called to fan-out __map_bio() calls.

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:32 -05:00
Mike Snitzer
300432f58b dm: reduce dm_io and dm_target_io struct sizes
Remove one 4 byte hole in dm_io struct.
Remove two 4 byte holes in dm_target_io struct.

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:31 -05:00
Mike Snitzer
018b05ebbf dm: move duplicate code from callers of alloc_tio into alloc_tio
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:30 -05:00
Mike Snitzer
743598f049 dm: record old_sector in dm_target_io before calling map function
Prep for being able to defer trace_block_bio_remap() until when the
bio is remapped and submitted by the DM target.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:29 -05:00
Mike Snitzer
77c11720a4 dm: remove legacy code only needed before submit_bio recursion
Commit 8615cb65bd ("dm: remove useless loop in
__split_and_process_bio") showcased that we no longer loop.

Remove the bio_advance() in __split_and_process_bio() that was only
needed when looping was possible.

Similarly there is no need to advance the bio, using ci->sector
cursor, in __send_duplicate_bios().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:28 -05:00
Mike Snitzer
0119ab14c3 dm: remove unused mapped_device argument from free_tio
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:27 -05:00
Mike Snitzer
5b27b8ddbf dm: remove impossible BUG_ON in __send_empty_flush
The flush_bio in question was just initialized to be empty, so there
is no way bio_has_data() will return true.  So remove stale BUG_ON().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:26 -05:00
Mike Snitzer
90a2326ede dm: reduce code duplication in __map_bio
Error path code (for handling DM_MAPIO_REQUEUE and DM_MAPIO_KILL) is
effectively identical.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:25 -05:00
Mike Snitzer
d41e077ab6 dm: refactor dm_split_and_process_bio a bit
Remove needless branching and indentation. Leaves code to catch
malformed op_is_zone_mgmt bios (they shouldn't have a payload).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:24 -05:00
Mike Snitzer
66bdaa4302 dm: fold __clone_and_map_data_bio into __split_and_process_bio
Fold __clone_and_map_data_bio into its only caller.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:23 -05:00
Mike Snitzer
96c9865cb6 dm: rename split functions
Rename __split_and_process_bio to dm_split_and_process_bio.
Rename __split_and_process_non_flush to __split_and_process_bio.

Also fix a stale comment and whitespace.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:22 -05:00
Mike Snitzer
205649d84c dm: reorder members in mapped_device struct
Improves alignment and groups related members relative to cachelines.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:21 -05:00
Mike Snitzer
0ab30b4079 dm: eliminate copying of dm_io fields in dm_io_dec_pending
There is no need for dm_io_dec_pending() to copy dm_io fields
anymore now that DM provides its own pending_io counters again.

The race documented in commit d208b89401 ("dm: fix mempool NULL
pointer race when completing IO") no longer exists now that block
core's in_flight counters aren't used to signal all dm_io is
complete.

Also, rename {start,end}_io_acct to dm_{start,end}_io_acct.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:36:18 -05:00
Mike Snitzer
0cdb90f0f3 dm stats: fix too short end duration_ns when using precise_timestamps
dm_stats_account_io()'s STAT_PRECISE_TIMESTAMPS support doesn't handle
the fact that with commit b879f915bc ("dm: properly fix redundant
bio-based IO accounting") io->start_time _may_ be in the past (meaning
the start_io_acct() was deferred until later).

Add a new dm_stats_recalc_precise_timestamps() helper that will
set/clear a new 'precise_timestamps' flag in the dm_stats struct based
on whether any configured stats enable STAT_PRECISE_TIMESTAMPS.
And update DM core's alloc_io() to use dm_stats_record_start() to set
stats_aux.duration_ns if stats->precise_timestamps is true.

Also, remove unused 'last_sector' and 'last_rw' members from the
dm_stats struct.

Fixes: b879f915bc ("dm: properly fix redundant bio-based IO accounting")
Cc: stable@vger.kernel.org
Co-developed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:35:39 -05:00
Mike Snitzer
8d394bc4ad dm: fix double accounting of flush with data
DM handles a flush with data by first issuing an empty flush and then
once it completes the REQ_PREFLUSH flag is removed and the payload is
issued.  The problem fixed by this commit is that both the empty flush
bio and the data payload will account the full extent of the data
payload.

Fix this by factoring out dm_io_acct() and having it wrap all IO
accounting to set the size of  bio with REQ_PREFLUSH to 0, account the
IO, and then restore the original size.

Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:35:25 -05:00
Mike Snitzer
9f6dc63376 dm: interlock pending dm_io and dm_wait_for_bios_completion
Commit d208b89401 ("dm: fix mempool NULL pointer race when
completing IO") didn't go far enough.

When bio_end_io_acct ends the count of in-flight I/Os may reach zero
and the DM device may be suspended. There is a possibility that the
suspend races with dm_stats_account_io.

Fix this by adding percpu "pending_io" counters to track outstanding
dm_io. Move kicking of suspend queue to dm_io_dec_pending(). Also,
rename md_in_flight_bios() to dm_in_flight_bios() and update it to
iterate all pending_io counters.

Fixes: d208b89401 ("dm: fix mempool NULL pointer race when completing IO")
Cc: stable@vger.kernel.org
Co-developed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-21 15:19:08 -05:00
Yahu Gao
bcd2be7632 block/bfq_wf2q: correct weight to ioprio
The return value is ioprio * BFQ_WEIGHT_CONVERSION_COEFF or 0.
What we want is ioprio or 0.
Correct this by changing the calculation.

Signed-off-by: Yahu Gao <gaoyahu19@gmail.com>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20220107065859.25689-1-gaoyahu19@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 20:09:14 -07:00
David Jeffery
8f5fea65b0 blk-mq: avoid extending delays of active hctx from blk_mq_delay_run_hw_queues
When blk_mq_delay_run_hw_queues sets an hctx to run in the future, it can
reset the delay length for an already pending delayed work run_work. This
creates a scenario where multiple hctx may have their queues set to run,
but if one runs first and finds nothing to do, it can reset the delay of
another hctx and stall the other hctx's ability to run requests.

To avoid this I/O stall when an hctx's run_work is already pending,
leave it untouched to run at its current designated time rather than
extending its delay. The work will still run which keeps closed the race
calling blk_mq_delay_run_hw_queues is needed for while also avoiding the
I/O stall.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220131203337.GA17666@redhat
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:46:20 -07:00
Christoph Hellwig
24b45e6c25 virtio_blk: simplify refcounting
Implement the ->free_disk method to free the virtio_blk structure only
once the last gendisk reference goes away instead of keeping a local
refcount.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20220215094514.3828912-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:44:24 -07:00
Christoph Hellwig
185ed423d1 memstick/mspro_block: simplify refcounting
Implement the ->free_disk method to free the msb_data structure only once
the last gendisk reference goes away instead of keeping a local
refcount.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220215094514.3828912-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:44:24 -07:00
Christoph Hellwig
6dab421bfe memstick/mspro_block: fix handling of read-only devices
Use set_disk_ro to propagate the read-only state to the block layer
instead of checking for it in ->open and leaking a reference in case
of a read-only device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220215094514.3828912-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:44:24 -07:00
Christoph Hellwig
e2efa07966 memstick/ms_block: simplify refcounting
Implement the ->free_disk method to free the msb_data structure only once
the last gendisk reference goes away instead of keeping a local refcount.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220215094514.3828912-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:44:24 -07:00
Christoph Hellwig
76792055c4 block: add a ->free_disk method
Add a method to notify the driver that the gendisk is about to be freed.
This allows drivers to tie the lifetime of their private data to that of
the gendisk and thus deal with device removal races without expensive
synchronization and boilerplate code.

A new flag is added so that ->free_disk is only called after a successful
call to add_disk, which significantly simplifies the error handling path
during probing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220215094514.3828912-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:44:24 -07:00
Ming Lei
34841e6fb1 block: revert 4f1e9630af ("blk-throtl: optimize IOPS throttle for large IO scenarios")
Revert commit 4f1e9630af ("blk-throtl: optimize IOPS throttle for large
IO scenarios") since we have another easier way to address this issue and
get better iops throttling result.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:42:28 -07:00
Ming Lei
5a93b6027e block: don't try to throttle split bio if iops limit isn't set
We need to throttle split bio in case of IOPS limit even though the
split bio has been marked as BIO_THROTTLED since block layer
accounts split bio actually.

If only throughput throttle is setup, no need to throttle any more
if BIO_THROTTLED is set since we have accounted & considered the
whole bio bytes already.

Add one flag of THROTL_TG_HAS_IOPS_LIMIT for serving this purpose.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:42:28 -07:00
Ming Lei
9f5ede3c01 block: throttle split bio in case of iops limit
Commit 111be88398 ("block-throttle: avoid double charge") marks bio as
BIO_THROTTLED unconditionally if __blk_throtl_bio() is called on this bio,
then this bio won't be called into __blk_throtl_bio() any more. This way
is to avoid double charge in case of bio splitting. It is reasonable for
read/write throughput limit, but not reasonable for IOPS limit because
block layer provides io accounting against split bio.

Chunguang Xu has already observed this issue and fixed it in commit
4f1e9630af ("blk-throtl: optimize IOPS throttle for large IO scenarios").
However, that patch only covers bio splitting in __blk_queue_split(), and
we have other kind of bio splitting, such as bio_split() &
submit_bio_noacct() and other ways.

This patch tries to fix the issue in one generic way by always charging
the bio for iops limit in blk_throtl_bio(). This way is reasonable:
re-submission & fast-cloned bio is charged if it is submitted to same
disk/queue, and BIO_THROTTLED will be cleared if bio->bi_bdev is changed.

This new approach can get much more smooth/stable iops limit compared with
commit 4f1e9630af ("blk-throtl: optimize IOPS throttle for large IO
scenarios") since that commit can't throttle current split bios actually.

Also this way won't cause new double bio iops charge in
blk_throtl_dispatch_work_fn() in which blk_throtl_bio() won't be called
any more.

Reported-by: Ning Li <lining2020x@163.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:42:28 -07:00
Ming Lei
d24c670ec1 block: merge submit_bio_checks() into submit_bio_noacct
Now submit_bio_checks() is only called by submit_bio_noacct(), so merge
it into submit_bio_noacct().

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:42:28 -07:00
Ming Lei
3f98c75371 block: don't check bio in blk_throtl_dispatch_work_fn
The bio has been checked already before throttling, so no need to check
it again before dispatching it from throttle queue.

Add a helper of submit_bio_noacct_nocheck() for this purpose.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220216044514.2903784-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:42:28 -07:00
Ming Lei
29ff23624e block: don't declare submit_bio_checks in local header
submit_bio_checks() won't be called outside of block/blk-core.c any more
since commit 9d497e2941 ("block: don't protect submit_bio_checks by
q_usage_counter"), so mark it as one local helper.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:42:28 -07:00
Ming Lei
7f36b7d02a block: move blk_crypto_bio_prep() out of blk-mq.c
blk_crypto_bio_prep() is called for both bio based and blk-mq drivers,
so move it out of blk-mq.c, then we can unify this kind of handling.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:42:27 -07:00
Ming Lei
a650628bde block: move submit_bio_checks() into submit_bio_noacct
It is more clean & readable to check bio when starting to submit it,
instead of just before calling ->submit_bio() or blk_mq_submit_bio().

Also it provides us chance to optimize bio submission without checking
bio.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:42:27 -07:00
Christoph Hellwig
9f9adea718 dm: remove dm_dispatch_clone_request
Fold dm_dispatch_clone_request into it's only caller, and use a switch
statement to single dispatch for the handling of the different return
values from blk_insert_cloned_request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:39:10 -07:00
Christoph Hellwig
8803c89f36 dm: remove useless code from dm_dispatch_clone_request
Both ->start_time_ns and the RQF_IO_STAT are set when the request is
allocated using blk_mq_alloc_request by dm-mpath in blk_mq_rq_ctx_init.
The block layer also ensures ->start_time_ns is only set when actually
needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:39:10 -07:00
Christoph Hellwig
28db4711bf blk-mq: remove the request_queue argument to blk_insert_cloned_request
The request must be submitted to the queue it was allocated for, so
remove the extra request_queue argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:39:10 -07:00
Christoph Hellwig
a5efda3c46 blk-mq: fold blk_cloned_rq_check_limits into blk_insert_cloned_request
Fold blk_cloned_rq_check_limits into its only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:39:09 -07:00
Christoph Hellwig
248c793359 blk-mq: make the blk-mq stacking code optional
The code to stack blk-mq drivers is only used by dm-multipath, and
will preferably stay that way.  Make it optional and only selected
by device mapper, so that the buildbots more easily catch abuses
like the one that slipped in in the ufs driver in the last merged
window.  Another positive side effects is that kernel builds without
device mapper shrink a little bit as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:39:09 -07:00
Chengming Zhou
f122d103b5 blk-cgroup: set blkg iostat after percpu stat aggregation
Don't need to do blkg_iostat_set for top blkg iostat on each CPU,
so move it after percpu stat aggregation.

Fixes: ef45fe470e ("blk-cgroup: show global disk stats in root cgroup io.stat")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220213085902.88884-1-zhouchengming@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-15 14:13:12 -07:00
Chaitanya Kulkarni
ec9fd2a13d blk-lib: don't check bdev_get_queue() NULL check
Based on the comment present in the bdev_get_queue()
bdev->bd_queue can never be NULL. Remove the NULL check for the local
variable q that is set from bdev_get_queue() for discard, write_same,
and write_zeroes.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220215115247.11717-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-15 07:51:46 -07:00
Christoph Hellwig
69591a402d block: remove biodoc.rst
This document is completely out of date and extremely misleading. In
general the existing kerneldoc comment serve as a much better
documentation of the still existing functionality, while the history
blurbs are pretty much irrelevant today.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220215081047.3693582-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-15 07:47:52 -07:00
Barry Song
2e2f0199a2 docs: block: biodoc.rst: Drop the obsolete and incorrect content
Since commit 7eaceaccab ("block: remove per-queue plugging"), kernel
has removed blk_run_address_space(), blk_unplug() and sync_buffer(),
and moved to on-stack plugging. The document has been obsolete for
years.
Given that there is no obvious counterparts in the new mechinism to
replace old APIs, this patch drops the content directly.

Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Link: https://lore.kernel.org/r/20220207074931.20067-1-song.bao.hua@hisilicon.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11 15:07:43 -07:00
Ming Lei
672fdcf0e7 block: partition include/linux/blk-cgroup.h
Partition include/linux/blk-cgroup.h into two parts: one is public part,
the other is block layer private part.

Suggested by Christoph Hellwig.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220211101149.2368042-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11 10:02:41 -07:00
Ming Lei
472e4314c0 block: move initialization of q->blkg_list into blkcg_init_queue
q->blkg_list is only used by blkcg code, so move it into
blkcg_init_queue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220211101149.2368042-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11 10:02:41 -07:00
Ming Lei
0e51e2ab49 block: remove THROTL_IOPS_MAX
No one uses THROTL_IOPS_MAX any more, so remove it.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220211101149.2368042-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11 10:02:41 -07:00
Yang Shi
d5869fdc18 block: introduce block_rq_error tracepoint
Currently, rasdaemon uses the existing tracepoint block_rq_complete
and filters out non-error cases in order to capture block disk errors.

But there are a few problems with this approach:

1. Even kernel trace filter could do the filtering work, there is
   still some overhead after we enable this tracepoint.

2. The filter is merely based on errno, which does not align with kernel
   logic to check the errors for print_req_error().

3. block_rq_complete only provides dev major and minor to identify
   the block device, it is not convenient to use in user-space.

So introduce a new tracepoint block_rq_error just for the error case.
With this patch, rasdaemon could switch to block_rq_error.

Since the new tracepoint has the similar implementation with
block_rq_complete, so move the existing code from TRACE_EVENT
block_rq_complete() into new event class block_rq_completion(). Then add
event for block_rq_complete and block_rq_err respectively from the newly
created event class per the suggestion from Chaitanya Kulkarni.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220210225222.260069-1-shy828301@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11 10:00:16 -07:00
John Garry
3f607293b7 sbitmap: Delete old sbitmap_queue_get_shallow()
Since __sbitmap_queue_get_shallow() was introduced in commit c05e667337
("sbitmap: add sbitmap_get_shallow() operation"), it has not been used.

Delete __sbitmap_queue_get_shallow() and rename public
__sbitmap_queue_get_shallow() -> sbitmap_queue_get_shallow() as it is odd
to have public __foo but no foo at all.

Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1644322024-105340-1-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-08 06:55:49 -07:00
Ming Lei
3301bc5335 lib/sbitmap: kill 'depth' from sbitmap_word
Only the last sbitmap_word can have different depth, and all the others
must have same depth of 1U << sb->shift, so not necessary to store it in
sbitmap_word, and it can be retrieved easily and efficiently by adding
one internal helper of __map_depth(sb, index).

Remove 'depth' field from sbitmap_word, then the annotation of
____cacheline_aligned_in_smp for 'word' isn't needed any more.

Not see performance effect when running high parallel IOPS test on
null_blk.

This way saves us one cacheline(usually 64 words) per each sbitmap_word.

Cc: Martin Wilck <martin.wilck@suse.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20220110072945.347535-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-08 06:54:50 -07:00