Commit Graph

983484 Commits

Author SHA1 Message Date
Jeffle Xu
5b0fab5089 dm table: fix DAX iterate_devices based device capability checks
Fix dm_table_supports_dax() and invert logic of both
iterate_devices_callout_fn so that all devices' DAX capabilities are
properly checked.

Fixes: 545ed20e6d ("dm: add infrastructure for DAX support")
Cc: stable@vger.kernel.org
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-09 08:45:30 -05:00
Jeffle Xu
a4c8dd9c2d dm table: fix iterate_devices based device capability checks
According to the definition of dm_iterate_devices_fn:
 * This function must iterate through each section of device used by the
 * target until it encounters a non-zero return code, which it then returns.
 * Returns zero if no callout returned non-zero.

For some target type (e.g. dm-stripe), one call of iterate_devices() may
iterate multiple underlying devices internally, in which case a non-zero
return code returned by iterate_devices_callout_fn will stop the iteration
in advance. No iterate_devices_callout_fn should return non-zero unless
device iteration should stop.

Rename dm_table_requires_stable_pages() to dm_table_any_dev_attr() and
elevate it for reuse to stop iterating (and return non-zero) on the
first device that causes iterate_devices_callout_fn to return non-zero.
Use dm_table_any_dev_attr() to properly iterate through devices.

Rename device_is_nonrot() to device_is_rotational() and invert logic
accordingly to fix improper disposition.

Fixes: c3c4555edd ("dm table: clear add_random unless all devices have it set")
Fixes: 4693c9668f ("dm table: propagate non rotational flag")
Cc: stable@vger.kernel.org
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-09 08:45:25 -05:00
Mikulas Patocka
054bee1616 dm writecache: return the exact table values that were set
LVM doesn't like it when the target returns different values from what
was set in the constructor. Fix dm-writecache so that the returned
table values are exactly the same as requested values.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-08 11:32:54 -05:00
Ahmad Fatoum
363880c4eb dm crypt: support using trusted keys
Commit 27f5411a71 ("dm crypt: support using encrypted keys") extended
dm-crypt to allow use of "encrypted" keys along with "user" and "logon".

Along the same lines, teach dm-crypt to support "trusted" keys as well.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:13:00 -05:00
Ahmad Fatoum
831475cc0b dm crypt: replaced #if defined with IS_ENABLED
IS_ENABLED(CONFIG_ENCRYPTED_KEYS) is true whether the option is built-in
or a module, so use it instead of #if defined checking for each
separately.

The other #if was to avoid a static function defined, but unused
warning. As we now always build the callsite when the function
is defined, we can remove that first #if guard.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:12:31 -05:00
Tian Tao
21ec672ecf dm writecache: fix unnecessary NULL check warnings
Remove NULL checks before vfree() to fix these warnings:
./drivers/md/dm-writecache.c:2008:2-7: WARNING: NULL check before some
freeing functions is not needed.
./drivers/md/dm-writecache.c:2024:2-7: WARNING: NULL check before some
freeing functions is not needed.

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:10:05 -05:00
Mikulas Patocka
cb728484a7 dm writecache: fix performance degradation in ssd mode
Fix a thinko in ssd_commit_superblock. region.count is in sectors, not
bytes. This bug doesn't corrupt data, but it causes performance
degradation.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: dc8a01ae1d ("dm writecache: optimize superblock write")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:10:05 -05:00
Mikulas Patocka
09d85f8d89 dm integrity: introduce the "fix_hmac" argument
The "fix_hmac" argument improves security of internal_hash and
journal_mac:
- the section number is mixed to the mac, so that an attacker can't
  copy sectors from one journal section to another journal section
- the superblock is protected by journal_mac
- a 16-byte salt stored in the superblock is mixed to the mac, so
  that the attacker can't detect that two disks have the same hmac
  key and also to disallow the attacker to move sectors from one
  disk to another

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Daniel Glockner <dg@emlix.com>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> # ReST fix
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:10:05 -05:00
Jinoh Kang
4c9e9883c2 dm persistent data: fix return type of shadow_root()
shadow_root() truncates 64-bit dm_block_t into 32-bit int.  This is
not an issue in practice, since dm metadata as of v5.11 can only hold at
most 4161600 blocks (255 index entries * ~16k metadata blocks).

Nevertheless, this can confuse users debugging some specific data
corruption scenarios.  Also, DM_SM_METADATA_MAX_BLOCKS may be bumped in
the future, or persistent-data may find its use in other places.

Therefore, switch the return type of shadow_root from int to dm_block_t.

Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:10:05 -05:00
Jeffle Xu
62f263178c dm: cleanup of front padding calculation
Add two helper macros calculating the offset of bio in struct dm_io and
struct dm_target_io respectively.

Besides, simplify the front padding calculation in
dm_alloc_md_mempools().

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:10:05 -05:00
Colin Ian King
23c4ecbc3e dm integrity: fix spelling mistake "flusing" -> "flushing"
There is a spelling mistake in a dm_integrity_io_error error
message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:10:05 -05:00
Geert Uytterhoeven
74d1da3988 dm crypt: Spelling s/cihper/cipher/
Fix a misspelling of "cipher".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:10:05 -05:00
Tom Rix
892c7a77f6 dm dust: remove h from printk format specifier
See Documentation/core-api/printk-formats.rst.

commit cbacb5ab0a ("docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]")

Standard integer promotion is already done and %hx and %hhx is useless
so do not encourage the use of %hh[xudi] or %h[xudi].

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-02-03 10:10:04 -05:00
Ming Lei
8358c28a5d block: fix memory leak of bvec
bio_init() clears bio instance, so the bvec index has to be set after
bio_init(), otherwise bio->bi_io_vec may be leaked.

Fixes: 3175199ab0 ("block: split bio_kmalloc from bio_alloc_bioset")
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-02 08:57:56 -07:00
Christoph Hellwig
a42e0d70c5 md: use rdev_read_only in restart_array
Make the read-only check in restart_array identical to the other two
read-only checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-01 09:35:20 -07:00
Christoph Hellwig
d7a4783883 md: check for NULL ->meta_bdev before calling bdev_read_only
->meta_bdev is optional and not set for most arrays.  Add a
rdev_read_only helper that calls bdev_read_only for both devices
in a safe way.

Fixes: 6f0d9689b6 ("block: remove the NULL bdev check in bdev_read_only")
Reported-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-01 09:35:20 -07:00
Lukas Bulwahn
f7bf5e24e0 block: drop removed argument from kernel-doc of blk_execute_rq()
Commit 684da7628d ("block: remove unnecessary argument from
blk_execute_rq") changes the signature of blk_execute_rq(), but misses
to adjust its kernel-doc.

Hence, make htmldocs warns on ./block/blk-exec.c:78:

  warning: Excess function parameter 'q' description in 'blk_execute_rq'

Drop removed argument from kernel-doc of blk_execute_rq() as well.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Guoqing Jiang <Guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-29 07:43:29 -07:00
Lukas Bulwahn
7f31bee360 block: remove typo in kernel-doc of set_disk_ro()
Commit 52f019d43c ("block: add a hard-readonly flag to struct gendisk")
provides some kernel-doc for set_disk_ro(), but introduces a small typo.

Hence, make htmldocs warns on ./block/genhd.c:1441:

  warning: Function parameter or member 'read_only' not described in 'set_disk_ro'
  warning: Excess function parameter 'ready_only' description in 'set_disk_ro'

Remove that typo in the kernel-doc for set_disk_ro().

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-29 07:15:50 -07:00
Baolin Wang
6b4eeba331 blk-cgroup: Remove obsolete macro
Remove the obsolete 'MAX_KEY_LEN' macro.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-28 07:33:36 -07:00
Chaitanya Kulkarni
59c157433f nvme-core: check bdev value for NULL
The nvme-core sets the bdev to NULL when admin comamnd is issued from
IOCTL in the following path e.g. nvme list :-

block_ioctl()
 blkdev_ioctl()
  nvme_ioctl()
   nvme_user_cmd()
    nvme_submit_user_cmd()

The commit 309dca309f ("block: store a block_device pointer in struct bio")
now uses bdev unconditionally in the macro bio_set_dev() and assumes
that bdev value is not NULL which results in the following crash in
since thats where bdev is actually accessed :-

void bio_associate_blkg_from_css(struct bio *bio,
				 struct cgroup_subsys_state *css)
{
	if (bio->bi_blkg)
		blkg_put(bio->bi_blkg);

	if (css && css->parent) {
		bio->bi_blkg = blkg_tryget_closest(bio, css);
	} else {
-------------->	blkg_get(bio->bi_bdev->bd_disk->queue->root_blkg);
		bio->bi_blkg = bio->bi_bdev->bd_disk->queue->root_blkg;
	}
}
EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);

[  345.385947] BUG: kernel NULL pointer dereference, address: 0000000000000690
[  345.387103] #PF: supervisor read access in kernel mode
[  345.387894] #PF: error_code(0x0000) - not-present page
[  345.388756] PGD 162a2b067 P4D 162a2b067 PUD 1633eb067 PMD 0
[  345.389625] Oops: 0000 [#1] SMP NOPTI
[  345.390206] CPU: 15 PID: 4100 Comm: nvme Tainted: G           OE     5.11.0-rc5blk+ #141
[  345.391377] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba52764
[  345.393074] RIP: 0010:bio_associate_blkg_from_css.cold.47+0x58/0x21f

[  345.396362] RSP: 0018:ffffc90000dbbce8 EFLAGS: 00010246
[  345.397078] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
[  345.398114] RDX: 0000000000000000 RSI: ffff888813be91f0 RDI: ffff888813be91f8
[  345.399039] RBP: ffffc90000dbbd30 R08: 0000000000000001 R09: 0000000000000001
[  345.399950] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff888812d32800
[  345.401031] R13: 0000000000000000 R14: ffff888113e51540 R15: ffff888113e51540
[  345.401976] FS:  00007f3747f1d780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000
[  345.402997] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  345.403737] CR2: 0000000000000690 CR3: 000000081a4bc000 CR4: 00000000003506e0
[  345.404685] Call Trace:
[  345.405031]  bio_associate_blkg+0x71/0x1c0
[  345.405649]  nvme_submit_user_cmd+0x1aa/0x38e [nvme_core]
[  345.406348]  nvme_user_cmd.isra.73.cold.98+0x54/0x92 [nvme_core]
[  345.407117]  nvme_ioctl+0x226/0x260 [nvme_core]
[  345.407707]  blkdev_ioctl+0x1c8/0x2b0
[  345.408183]  block_ioctl+0x3f/0x50
[  345.408627]  __x64_sys_ioctl+0x84/0xc0
[  345.409117]  do_syscall_64+0x33/0x40
[  345.409592]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  345.410233] RIP: 0033:0x7f3747632107

[  345.413125] RSP: 002b:00007ffe461b6648 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[  345.414086] RAX: ffffffffffffffda RBX: 00000000007b7fd0 RCX: 00007f3747632107
[  345.414998] RDX: 00007ffe461b6650 RSI: 00000000c0484e41 RDI: 0000000000000004
[  345.415966] RBP: 0000000000000004 R08: 00000000007b7fe8 R09: 00000000007b9080
[  345.416883] R10: 00007ffe461b62c0 R11: 0000000000000206 R12: 00000000007b7fd0
[  345.417808] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000

Add a NULL check before we set the bdev for bio.

This issue is found on block/for-next tree.

Fixes: 309dca309f ("block: store a block_device pointer in struct bio")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 10:10:15 -07:00
Jens Axboe
3e3126cf2a mm: only make map_swap_entry available for CONFIG_HIBERNATION
Current tree spews this on compile:

mm/swapfile.c:2290:17: warning: ‘map_swap_entry’ defined but not used [-Wunused-function]
 2290 | static sector_t map_swap_entry(swp_entry_t entry, struct block_device **bdev)
       |                 ^~~~~~~~~~~~~~

if !CONFIG_HIBERNATION, as we don't use the function unless we have that
config option set.

Fixes: 48d15436fd ("mm: remove get_swap_bio")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 10:04:49 -07:00
Christoph Hellwig
48d15436fd mm: remove get_swap_bio
Just reuse the block_device and sector from the swap_info structure,
just as used by the SWP_SYNCHRONOUS path.  Also remove the checks for
NULL returns from bio_alloc as that can't happen for sleeping
allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:49 -07:00
Christoph Hellwig
64820ac6c6 nilfs2: remove cruft in nilfs_alloc_seg_bio
bio_alloc never returns NULL when it can sleep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
6808f7af96 nfs/blocklayout: remove cruft in bl_alloc_init_bio
bio_alloc never returns NULL when it can sleep.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
e82ed3a4fb md/raid6: refactor raid5_read_one_chunk
Refactor raid5_read_one_chunk so that all simple checks are done
before allocating the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
6a59656968 md: remove md_bio_alloc_sync
md_bio_alloc_sync is never called with a NULL mddev, and ->sync_set is
initialized in md_run, so it always must be initialized as well.  Just
open code the remaining call to bio_alloc_bioset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
32637385b8 md: simplify sync_page_io
Use an on-stack bio and biovec for the single page synchronous I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
a78f18da66 md: remove bio_alloc_mddev
bio_alloc_mddev is never called with a NULL mddev, and ->bio_set is
initialized in md_run, so it always must be initialized as well.  Just
open code the remaining call to bio_alloc_bioset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
ae7153f1a7 drbd: remove drbd_req_make_private_bio
Open code drbd_req_make_private_bio in the two callers to prepare
for further changes.  Also don't bother to initialize bi_next as the
bio code already does that that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
19304f959f drbd: remove bio_alloc_drbd
Given that drbd_md_io_bio_set is initialized during module initialization
and the module fails to load if the initialization fails there is no need
to fall back to plain bio_alloc.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
67883ade7a f2fs: remove FAULT_ALLOC_BIO
Sleeping bio allocations do not fail, which means that injecting an error
into sleeping bio allocations is a little silly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
25ac84262c f2fs: use blkdev_issue_flush in __submit_flush_wait
Use the blkdev_issue_flush helper instead of duplicating it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
a587daa064 dm-clone: use blkdev_issue_flush in commit_metadata
Use blkdev_issue_flush instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
c6bf3f0e25 block: use an on-stack bio in blkdev_issue_flush
There is no point in allocating memory for a synchronous flush.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
3175199ab0 block: split bio_kmalloc from bio_alloc_bioset
bio_kmalloc shares almost no logic with the bio_set based fast path
in bio_alloc_bioset.  Split it into an entirely separate implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
4eb1d68904 blk-crypto: use bio_kmalloc in blk_crypto_clone_bio
Use bio_kmalloc instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
616c6a6884 btrfs: use bio_kmalloc in __alloc_device
Use bio_kmalloc instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Christoph Hellwig
f91ca2a370 zonefs: use bio_alloc in zonefs_file_dio_append
Use bio_alloc instead of open coding it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00
Jan Kara
7684fbde45 bfq: Use only idle IO periods for think time calculations
Currently whenever bfq queue has a request queued we add now -
last_completion_time to the think time statistics. This is however
misleading in case the process is able to submit several requests in
parallel because e.g. if the queue has request completed at time T0 and
then queues new requests at times T1, T2, then we will add T1-T0 and
T2-T0 to think time statistics which just doesn't make any sence (the
queue's think time is penalized by the queue being able to submit more
IO). So add to think time statistics only time intervals when the queue
had no IO pending.

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
[axboe: fix whitespace on empty line]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:16:00 -07:00
Jan Kara
28c6def009 bfq: Use 'ttime' local variable
Use local variable 'ttime' instead of dereferencing bfqq.

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:15:38 -07:00
Jan Kara
41e76c8566 bfq: Avoid false bfq queue merging
bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it
makes sense to merge current bfq queue with the in-service queue.
However if the in-service queue is freshly scheduled and didn't dispatch
any requests yet, bfqd->in_serv_last_pos is stale and contains value
from the previously scheduled bfq queue which can thus result in a bogus
decision that the two queues should be merged. This bug can be observed
for example with the following fio jobfile:

[global]
direct=0
ioengine=sync
invalidate=1
size=1g
rw=read

[reader]
numjobs=4
directory=/mnt

where the 4 processes will end up in the one shared bfq queue although
they do IO to physically very distant files (for some reason I was able to
observe this only with slice_idle=1ms setting).

Fix the problem by invalidating bfqd->in_serv_last_pos when switching
in-service queue.

Fixes: 058fdecc6d ("block, bfq: fix in-service-queue check for queue merging")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:15:38 -07:00
Chunguang Xu
49d1822bc0 blkcg: delete redundant get/put operations for queue
When calling blkcg_schedule_throttle(), for the same queue,
redundant get/put operations can be removed.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 13:14:30 -07:00
Christoph Hellwig
2c2b9fd6b4 block: unexport truncate_bdev_range
truncate_bdev_range is only used in always built-in block layer code,
so remove the export and the !CONFIG_BLOCK stub.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 13:14:03 -07:00
Lei Chen
482e302a61 blk: wbt: remove unused parameter from wbt_should_throttle
The first parameter rwb is not used for this function.
So just remove it.

Signed-off-by: Lei Chen <lennychen@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 13:13:00 -07:00
Jan Kara
767630c63b bdev: Do not return EBUSY if bdev discard races with write
blkdev_fallocate() tries to detect whether a discard raced with an
overlapping write by calling invalidate_inode_pages2_range(). However
this check can give both false negatives (when writing using direct IO
or when writeback already writes out the written pagecache range) and
false positives (when write is not actually overlapping but ends in the
same page when blocksize < pagesize). This actually causes issues for
qemu which is getting confused by EBUSY errors.

Fix the problem by removing this conflicting write detection since it is
inherently racy and thus of little use anyway.

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
CC: "Darrick J. Wong" <darrick.wong@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20201111153913.41840-1-mlevitsk@redhat.com
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 10:22:18 -07:00
Christoph Hellwig
46bbf653a6 block: inherit BIO_REMAPPED when cloning bios
Cloned bios are can be used to on the same device, in which case we need
to inherit the BIO_REMAPPED flag to avoid a double partition remap.  When
the cloned bios are used on another device, bio_set_dev will clear the flag.

Fixes: 309dca309f ("block: store a block_device pointer in struct bio")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 08:50:01 -07:00
Christoph Hellwig
f65b95fe0c bcache: use bio_set_dev to assign ->bi_bdev
Always use the bio_set_dev helper to assign ->bi_bdev to make sure
other state related to the device is uptodate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 08:50:01 -07:00
Christoph Hellwig
a7c7f7b2b6 nvme: use bio_set_dev to assign ->bi_bdev
Always use the bio_set_dev helper to assign ->bi_bdev to make sure
other state related to the device is uptodate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26 08:50:01 -07:00
Jens Axboe
a5bf0a92e1 bfq: bfq_check_waker() should be static
It's only used in the same file, mark is appropriately static.

Fixes: 71217df39d ("block, bfq: make waker-queue detection more robust")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 21:15:01 -07:00
Paolo Valente
71217df39d block, bfq: make waker-queue detection more robust
In the presence of many parallel I/O flows, the detection of waker
bfq_queues suffers from false positives. This commits addresses this
issue by making the filtering of actual wakers more selective. In more
detail, a candidate waker must be found to meet waker requirements
three times before being promoted to actual waker.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25 14:18:37 -07:00