-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmGKqAcQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpojrD/4yA+GgV+jWeIepYWvU81TQFpt9AJmzWbrY
uryj4dy7EdMjun+JkAP8k4qreqvTZRsJMkr9dhmS4qaM8/Vt8K/RU/0n/lxNVmqc
1//ZaTS6DURVAc52GHIXD3q4cv8pHofTZZlrj1Hgz35shlOayStGJtktH5f8uQl4
5Yxjh+HKr15Chym+fKlbR6T7BgVxxNyhT9q89BgUwMAJX+1KRVtwtkyVK5IbObFy
zOeiC+n9niQ6iJHcLoqb7LjfBOs/VjdNOQYGSCAnrBxuQ8GnEP2xDw2nvFlOPE12
5tWEwTgAX7381ilbL6VvNTlTafIs/Axt8mI0cY/OMW7ApiHwO3rXjQSqA4yrnKCJ
h6M1QavqThd2DtMnOi0U5wwgtD2UjS+CMpK5XFxeIyl6GqTgZcaWm3VqRnG68KZD
r5+o99GKWCHy0cckxq2WiWJouReeNZ9u9R6HNDw0Vb8UNyWgBR+v2MkX+SHS/c85
2gXm10hwBH7BFnC4X8ceiuT/bm7xm9S6D/3LCVitlUTBRfqobsQEQjSciPeoOtL0
rRSTKob7jtokiB2q01wx3q1jnUMpxE1fqJkpLjUvebTzw+a+xfPwy0nNTGq0XXIv
WMVRRpSWCZm04Ru0q/K8cj0GOyur5x+ilefZ1V+/sRU5dVmGuJgbJUxei1HPC6eV
z9Rn0aFv4g==
=1GPi
-----END PGP SIGNATURE-----
Merge tag 'for-5.16/block-2021-11-09' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Set of fixes for the batched tag allocation (Ming, me)
- add_disk() error handling fix (Luis)
- Nested queue quiesce fixes (Ming)
- Shared tags init error handling fix (Ye)
- Misc cleanups (Jean, Ming, me)
* tag 'for-5.16/block-2021-11-09' of git://git.kernel.dk/linux-block:
nvme: wait until quiesce is done
scsi: make sure that request queue queiesce and unquiesce balanced
scsi: avoid to quiesce sdev->request_queue two times
blk-mq: add one API for waiting until quiesce is done
blk-mq: don't free tags if the tag_set is used by other device in queue initialztion
block: fix device_add_disk() kobject_create_and_add() error handling
block: ensure cached plug request matches the current queue
block: move queue enter logic into blk_mq_submit_bio()
block: make bio_queue_enter() fast-path available inline
block: split request allocation components into helpers
block: have plug stored requests hold references to the queue
blk-mq: update hctx->nr_active in blk_mq_end_request_batch()
blk-mq: add RQF_ELV debug entry
blk-mq: only try to run plug merge if request has same queue with incoming bio
block: move RQF_ELV setting into allocators
dm: don't stop request queue after the dm device is suspended
block: replace always false argument with 'false'
block: assign correct tag before doing prefetch of request
blk-mq: fix redundant check of !e expression
subsystem. Also enhance both the integrity and crypt targets to emit
events to via dm-audit.
- Various other simple code improvements and cleanups.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmGJlFkACgkQxSPxCi2d
A1pqwwf/YZ6kNKRQaKF1mbkkHOxa/ULf7qIhi/R0epwJu4j1RGsCACS34EqzLc4c
x15h6flCNj1IBVAqTvMUETYTjTLtyrcfD0yBRWYw2RL0ksHMHyMvd1r/7aE64+pj
EeZk9Xzcx3Gsq9GOzKfYA2AX0PrypkKSjgHK7hgv+Jh5heqkFcnMXSl3l7BQ6vbr
ue9joPSI7+6eVFMDn32KxyHzfm6zZo1nmKZ6tQBBHD1D9yBqWTAhXiyXhRA+BOYH
Tg5wE1fvZ/htyZNEc1cMRArzLF6q9pEU4r8j472N6IcJbhIJzSu0V60zVvexNWG3
fJSIWqlta1KFK8SQttmDmfFnJiFcyw==
=t097
-----END PGP SIGNATURE-----
Merge tag 'for-5.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Add DM core support for emitting audit events through the audit
subsystem. Also enhance both the integrity and crypt targets to emit
events to via dm-audit.
- Various other simple code improvements and cleanups.
* tag 'for-5.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm table: log table creation error code
dm: make workqueue names device-specific
dm writecache: Make use of the helper macro kthread_run()
dm crypt: Make use of the helper macro kthread_run()
dm verity: use bvec_kmap_local in verity_for_bv_block
dm log writes: use memcpy_from_bvec in log_writes_map
dm integrity: use bvec_kmap_local in __journal_read_write
dm integrity: use bvec_kmap_local in integrity_metadata
dm: add add_disk() error handling
dm: Remove redundant flush_workqueue() calls
dm crypt: log aead integrity violations to audit subsystem
dm integrity: log audit events for dm-integrity target
dm: introduce audit event module for device mapper
For fixing queue quiesce race between driver and block layer(elevator
switch, update nr_requests, ...), we need to support concurrent quiesce
and unquiesce, which requires the two call to be balanced.
__bind() is only called from dm_swap_table() in which dm device has been
suspended already, so not necessary to stop queue again. With this way,
request queue quiesce and unquiesce can be balanced.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: e70feb8b3e ("blk-mq: support concurrent queue quiesce/unquiesce")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/r/20211021145918.2691762-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We never checked for errors on add_disk() as this function returned
void. Now that this is fixed, use the shiny new error handling.
There are two calls to dm_setup_md_queue() which can fail then, one on
dm_early_create() and we can easily see that the error path there
calls dm_destroy in the error path. The other use case is on the ioctl
table_load case. If that fails userspace needs to call the
DM_DEV_REMOVE_CMD to cleanup the state - similar to any other
failure.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmF8KFsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgph1ZEACwNuHkAZcIgNzKhzuLP9OjMhv9vV+q254G
/EcM31e+qgRioMd0ihbVsgW76jOwLEmb3ldKGcN+0Wo5+Sv9Im8+wAWYY1REOZO5
ZTUBfAzhEh63/EtqTFiU8U+7dmXqy4z7NaICnhlynjwkd3IT+I561os6kcqwJMMr
G+Q1Cnk9rgCMIoLOCoVThIpjmjyZzF33qJb2VEIkHfkot62iNdpABWaSASF+CCba
z8LfbvLAYz3YLl4thXlLJFU282T5y7gzgSomGvX4F0rMJSbqFbgoNEPxaYw9CvzC
uC6MnYCYdCdvVkWVm1b8I8LYzPd5GrpVOSh3JQGvuA4Ppv2IyJCDSruYGgVUlhao
cVPzuHCqNCfKk0ykYVRZy9oKiBk5wmFeKM/lSHu408y8VNraPNIAEpB6sA9qGr22
AYr8lNh3JDr0g8dtFsDOq+7u3MANW0KQozfzwTPZo6NjzEE1D2jIg39Ljiijo9+Y
3pU8pitIAhsKd2KhW1H6LmtJbF4dX756VKYDXOhzgORU0NZYgvGhBIj9tAdpQR0S
xeae5Kj0/wBGcqR/owf/n1EY/q7rWgNDETnsBhbmzMZyhwH3L6zhT+bfD8YoQCHY
ueyqhyIUe4YBxTrIpICqwDlqaMYAmQ0jRaci+bK9ovVlQ89FQ9o/BE2COPlI/DGX
w+rUmmoX4g==
=HiWU
-----END PGP SIGNATURE-----
Merge tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
- paride driver cleanups (Christoph)
- Remove cryptoloop support (Christoph)
- null_blk poll support (me)
- Now that add_disk() supports proper error handling, add it to various
drivers (Luis)
- Make ataflop actually work again (Michael)
- s390 dasd fixes (Stefan, Heiko)
- nbd fixes (Yu, Ye)
- Remove redundant wq flush in mtip32xx (Christophe)
- NVMe updates
- fix a multipath partition scanning deadlock (Hannes Reinecke)
- generate uevent once a multipath namespace is operational again
(Hannes Reinecke)
- support unique discovery controller NQNs (Hannes Reinecke)
- fix use-after-free when a port is removed (Israel Rukshin)
- clear shadow doorbell memory on resets (Keith Busch)
- use struct_size (Len Baker)
- add error handling support for add_disk (Luis Chamberlain)
- limit the maximal queue size for RDMA controllers (Max Gurtovoy)
- use a few more symbolic names (Max Gurtovoy)
- fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy)
- add support for ->map_queues on FC (Saurav Kashyap)
- support the current discovery subsystem entry (Hannes Reinecke)
- use flex_array_size and struct_size (Len Baker)
- bcache fixes (Christoph, Coly, Chao, Lin, Qing)
- MD updates (Christoph, Guoqing, Xiao)
- Misc fixes (Dan, Ding, Jiapeng, Shin'ichiro, Ye)
* tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block: (117 commits)
null_blk: Fix handling of submit_queues and poll_queues attributes
block: ataflop: Fix warning comparing pointer to 0
bcache: replace snprintf in show functions with sysfs_emit
bcache: move uapi header bcache.h to bcache code directory
nvmet: use flex_array_size and struct_size
nvmet: register discovery subsystem as 'current'
nvmet: switch check for subsystem type
nvme: add new discovery log page entry definitions
block: ataflop: more blk-mq refactoring fixes
block: remove support for cryptoloop and the xor transfer
mtd: add add_disk() error handling
rnbd: add error handling support for add_disk()
um/drivers/ubd_kern: add error handling support for add_disk()
m68k/emu/nfblock: add error handling support for add_disk()
xen-blkfront: add error handling support for add_disk()
bcache: add error handling support for add_disk()
dm: add add_disk() error handling
block: aoe: fixup coccinelle warnings
nvmet: use struct_size over open coded arithmetic
nvme: drop scan_lock and always kick requeue list when removing namespaces
...
blk_keyslot_manager is misnamed because it doesn't necessarily manage
keyslots. It actually does several different things:
- Contains the crypto capabilities of the device.
- Provides functions to control the inline encryption hardware.
Originally these were just for programming/evicting keyslots;
however, new functionality (hardware-wrapped keys) will require new
functions here which are unrelated to keyslots. Moreover,
device-mapper devices already (ab)use "keyslot_evict" to pass key
eviction requests to their underlying devices even though
device-mapper devices don't have any keyslots themselves (so it
really should be "evict_key", not "keyslot_evict").
- Sometimes (but not always!) it manages keyslots. Originally it
always did, but device-mapper devices don't have keyslots
themselves, so they use a "passthrough keyslot manager" which
doesn't actually manage keyslots. This hack works, but the
terminology is unnatural. Also, some hardware doesn't have keyslots
and thus also uses a "passthrough keyslot manager" (support for such
hardware is yet to be upstreamed, but it will happen eventually).
Let's stop having keyslot managers which don't actually manage keyslots.
Instead, rename blk_keyslot_manager to blk_crypto_profile.
This is a fairly big change, since for consistency it also has to update
keyslot manager-related function names, variable names, and comments --
not just the actual struct name. However it's still a fairly
straightforward change, as it doesn't change any actual functionality.
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20211018180453.40441-4-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation for renaming struct blk_keyslot_manager to struct
blk_crypto_profile, rename the keyslot-manager.h and keyslot-manager.c
source files. Renaming these files separately before making a lot of
changes to their contents makes it easier for git to understand that
they were renamed.
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20211018180453.40441-3-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.
There are two calls to dm_setup_md_queue() which can fail then,
one on dm_early_create() and we can easily see that the error path
there calls dm_destroy in the error path. The other use case is on
the ioctl table_load case. If that fails userspace needs to call
the DM_DEV_REMOVE_CMD to cleanup the state - similar to any other
failure.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-4-mcgrof@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.
Polling for the bio itself leads to a few advantages:
- the cookie construction can made entirely private in blk-mq.c
- the caller does not need to remember the request_queue and cookie
separately and thus sidesteps their lifetime issues
- keeping the device and the cookie inside the bio allows to trivially
support polling BIOs remapping by stacking drivers
- a lot of code to propagate the cookie back up the submission path can
be removed entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Fix a race condition in the teardown path of raw mode pmem namespaces.
- Cleanup the code that filesystems use to detect filesystem-dax
capabilities of their underlying block device.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYTlBMgAKCRDfioYZHlFs
ZwQLAQCPhwpuOP+Byn7NksotnfmyLNyniK0mX7Me7PoLiyq0oAEAmqBwlr9YP7E3
NPzWiBzqPCvDIv1YG4C3Vam7ue1osgM=
=33O+
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
- Fix a race condition in the teardown path of raw mode pmem
namespaces.
- Cleanup the code that filesystems use to detect filesystem-dax
capabilities of their underlying block device.
* tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: remove bdev_dax_supported
xfs: factor out a xfs_buftarg_is_dax helper
dax: stub out dax_supported for !CONFIG_FS_DAX
dax: remove __generic_fsdax_supported
dax: move the dax_read_lock() locking into dax_supported
dax: mark dax_get_by_host static
dm: use fs_dax_get_by_bdev instead of dax_get_by_host
dax: stop using bdevname
fsdax: improve the FS_DAX Kconfig description and help text
libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind
There is no point in trying to finding the dax device if the DAX flag is
not set on the queue as none of the users of the device mapper exported
block devices could make use of the DAX capability.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210826135510.6293-4-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The end-users of DM devices/targets may remove and re-create the same
device multiple times. IMA does not measure such duplicate events if the
configuration CONFIG_IMA_DISABLE_HTABLE is set to 'n'.
To avoid confusion, the end-users need some indication on the client
if that configuration option is disabled.
Add a one-time warning during dm_init() if CONFIG_IMA_DISABLE_HTABLE
is set to 'n', to notify the end-users that duplicate events will not
be measured in the ima log. Also cleanup some whitespace in dm_init().
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
DM configures a block device with various target specific attributes
passed to it as a table. DM loads the table, and calls each target’s
respective constructors with the attributes as input parameters.
Some of these attributes are critical to ensure the device meets
certain security bar. Thus, IMA should measure these attributes, to
ensure they are not tampered with, during the lifetime of the device.
So that the external services can have high confidence in the
configuration of the block-devices on a given system.
Some devices may have large tables. And a given device may change its
state (table-load, suspend, resume, rename, remove, table-clear etc.)
many times. Measuring these attributes each time when the device
changes its state will significantly increase the size of the IMA logs.
Further, once configured, these attributes are not expected to change
unless a new table is loaded, or a device is removed and recreated.
Therefore the clear-text of the attributes should only be measured
during table load, and the hash of the active/inactive table should be
measured for the remaining device state changes.
Export IMA function ima_measure_critical_data() to allow measurement
of DM device parameters, as well as target specific attributes, during
table load. Compute the hash of the inactive table and store it for
measurements during future state change. If a load is called multiple
times, update the inactive table hash with the hash of the latest
populated table. So that the correct inactive table hash is measured
when the device transitions to different states like resume, remove,
rename, etc.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com> # leak fix
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
device mapper is currently the only outlier that tries to call
register_disk after add_disk, leading to fairly inconsistent state
of these block layer data structures. Instead change device-mapper
to just register the gendisk later now that the holder mechanism
can cope with that.
Note that this introduces a user visible change: the dm kobject is
now only visible after the initial table has been loaded.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move setting md->type from both callers into dm_setup_md_queue.
This ensures that md->type is only set to a valid value after the queue
has been fully setup, something we'll rely on future changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
md->queue is now always set when md->disk is set, so simplify the
conditionals a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
benefit both the DM thinp and cache targets.
- A few small DM kcopyd efficiency improvements.
- Significant zoned related block core, DM core and DM zoned target
changes that culminate with adding zoned append emulation (which is
required to properly fix DM crypt's zoned support).
- Various DM writecache target changes that improve efficiency. Adds
an optional "metadata_only" feature that only promotes bios flagged
with REQ_META. But the most significant improvement is writecache's
ability to pause writeback, for a confiurable time, if/when the
working set is larger than the cache (and the cache is full) -- this
ensures performance is no worse than the slower origin device.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmDcpWgACgkQxSPxCi2d
A1pD0AgAmySdpJxQBzBMOqnKaClErfxiWXDtvzBxFupG/jmqaN/k/kCFdKyDk89M
9r2rlv4+teZReEGjqjJ0umQgbX62x5y6f7vy4CeoE/+EQAUiZYXNARW8Uubu/Sgy
mmvsgAdiuJqfJCX5TiQDwZIdll/QV8isteddMpOdrdM0fpCNlTvRao4S9UE2Rfni
fPoPu7KNGDhKORvy/NloYFSHuxTaOSv6A44z15T2SoXPw9hLloFoXegE9Vrcfr/j
gwLX3ponp4+K91BzPWz0QIQ7Wh+7O4xrmcXtBIvuIGNcfV+oGMZMtq/zEX8T6sDh
GDlclxh/76iGgvINAQ437mXBINbPYQ==
=8dUv
-----END PGP SIGNATURE-----
Merge tag 'for-5.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Various DM persistent-data library improvements and fixes that
benefit both the DM thinp and cache targets.
- A few small DM kcopyd efficiency improvements.
- Significant zoned related block core, DM core and DM zoned target
changes that culminate with adding zoned append emulation (which is
required to properly fix DM crypt's zoned support).
- Various DM writecache target changes that improve efficiency. Adds an
optional "metadata_only" feature that only promotes bios flagged with
REQ_META. But the most significant improvement is writecache's
ability to pause writeback, for a confiurable time, if/when the
working set is larger than the cache (and the cache is full) -- this
ensures performance is no worse than the slower origin device.
* tag 'for-5.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (35 commits)
dm writecache: make writeback pause configurable
dm writecache: pause writeback if cache full and origin being written directly
dm io tracker: factor out IO tracker
dm btree remove: assign new_root only when removal succeeds
dm zone: fix dm_revalidate_zones() memory allocation
dm ps io affinity: remove redundant continue statement
dm writecache: add optional "metadata_only" parameter
dm writecache: add "cleaner" and "max_age" to Documentation
dm writecache: write at least 4k when committing
dm writecache: flush origin device when writing and cache is full
dm writecache: have ssd writeback wait if the kcopyd workqueue is busy
dm writecache: use list_move instead of list_del/list_add in writecache_writeback()
dm writecache: commit just one block, not a full page
dm writecache: remove unused gfp_t argument from wc_add_block()
dm crypt: Fix zoned block device support
dm: introduce zone append emulation
dm: rearrange core declarations for extended use from dm-zone.c
block: introduce BIO_ZONE_WRITE_LOCKED bio flag
block: introduce bio zone helpers
block: improve handling of all zones reset operation
...
Change the type and name of task_struct::state. Drop the volatile and
shrink it to an 'unsigned int'. Rename it in order to find all uses
such that we can use READ_ONCE/WRITE_ONCE as appropriate.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org
For zoned targets that cannot support zone append operations, implement
an emulation using regular write operations. If the original BIO
submitted by the user is a zone append operation, change its clone into
a regular write operation directed at the target zone write pointer
position.
To do so, an array of write pointer offsets (write pointer position
relative to the start of a zone) is added to struct mapped_device. All
operations that modify a sequential zone write pointer (writes, zone
reset, zone finish and zone append) are intersepted in __map_bio() and
processed using the new functions dm_zone_map_bio().
Detection of the target ability to natively support zone append
operations is done from dm_table_set_restrictions() by calling the
function dm_set_zones_restrictions(). A target that does not support
zone append operation, either by explicitly declaring it using the new
struct dm_target field zone_append_not_supported, or because the device
table contains a non-zoned device, has its mapped device marked with the
new flag DMF_ZONE_APPEND_EMULATED. The helper function
dm_emulate_zone_append() is introduced to test a mapped device for this
new flag.
Atomicity of the zones write pointer tracking and updates is done using
a zone write locking mechanism based on a bitmap. This is similar to
the block layer method but based on BIOs rather than struct request.
A zone write lock is taken in dm_zone_map_bio() for any clone BIO with
an operation type that changes the BIO target zone write pointer
position. The zone write lock is released if the clone BIO is failed
before submission or when dm_zone_endio() is called when the clone BIO
completes.
The zone write lock bitmap of the mapped device, together with a bitmap
indicating zone types (conv_zones_bitmap) and the write pointer offset
array (zwp_offset) are allocated and initialized with a full device zone
report in dm_set_zones_restrictions() using the function
dm_revalidate_zones().
For failed operations that may have modified a zone write pointer, the
zone write pointer offset is marked as invalid in dm_zone_endio().
Zones with an invalid write pointer offset are checked and the write
pointer updated using an internal report zone operation when the
faulty zone is accessed again by the user.
All functions added for this emulation have a minimal overhead for
zoned targets natively supporting zone append operations. Regular
device targets are also not affected. The added code also does not
impact builds with CONFIG_BLK_DEV_ZONED disabled by stubbing out all
dm zone related functions.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Move the definitions of struct dm_target_io, struct dm_io and the bits
of the flags field of struct mapped_device from dm.c to dm-core.h to
make them usable from dm-zone.c. For the same reason, declare
dec_pending() in dm-core.h after renaming it to dm_io_dec_pending().
And for symmetry of the function names, introduce the inline helper
dm_io_inc_pending() instead of directly using atomic_inc() calls.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
A target map method requesting the requeue of a bio with
DM_MAPIO_REQUEUE or completing it with DM_ENDIO_REQUEUE can cause
unaligned write errors if the bio is a write operation targeting a
sequential zone. If a zoned target request such a requeue, warn about
it and kill the IO.
The function dm_is_zone_write() is introduced to detect write operations
to zoned targets.
This change does not affect the target drivers supporting zoned devices
and exposing a zoned device, namely dm-crypt, dm-linear and dm-flakey as
none of these targets ever request a requeue.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Move core and table code used for zoned targets and conditionally
defined with #ifdef CONFIG_BLK_DEV_ZONED to the new file dm-zone.c.
This file is conditionally compiled depending on CONFIG_BLK_DEV_ZONED.
The small helper dm_set_zones_restrictions() is introduced to
initialize a mapped device request queue zone attributes in
dm_table_set_restrictions().
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fix dm_accept_partial_bio() to actually check that zone management
commands are not passed as explained in the function documentation
comment. Also, since a zone append operation cannot be split, add
REQ_OP_ZONE_APPEND as a forbidden command.
White lines are added around the group of BUG_ON() calls to make the
code more legible.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Convert the dm driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These are only used by DM core, DM target modules should only use
dm_{get,put}_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Remove useless "while" loop. If the condition ci.sector_count && !error is
true, we go to a branch that ends with "break". If this condition is
false, the "while" loop will not be executed again. So, the loop can't be
executed more than once.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
When a DM device is first created it doesn't yet have an established
capacity, therefore the use of set_capacity_and_notify() should be
conditional given the potential for needless pr_info "detected
capacity change" noise even if capacity is 0.
One could argue that the pr_info() in set_capacity_and_notify() is
misplaced, but that position is not held uniformly.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: f64d9b2eac ("dm: use set_capacity_and_notify")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The system would deadlock when swapping to a dm-crypt device. The reason
is that for each incoming write bio, dm-crypt allocates memory that holds
encrypted data. These excessive allocations exhaust all the memory and the
result is either deadlock or OOM trigger.
This patch limits the number of in-flight swap bios, so that the memory
consumed by dm-crypt is limited. The limit is enforced if the target set
the "limit_swap_bios" variable and if the bio has REQ_SWAP set.
Non-swap bios are not affected becuase taking the semaphore would cause
performance degradation.
This is similar to request-based drivers - they will also block when the
number of requests is over the limit.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Update the device-mapper core to support exposing the inline crypto
support of the underlying device(s) through the device-mapper device.
This works by creating a "passthrough keyslot manager" for the dm
device, which declares support for encryption settings which all
underlying devices support. When a supported setting is used, the bio
cloning code handles cloning the crypto context to the bios for all the
underlying devices. When an unsupported setting is used, the blk-crypto
fallback is used as usual.
Crypto support on each underlying device is ignored unless the
corresponding dm target opts into exposing it. This is needed because
for inline crypto to semantically operate on the original bio, the data
must not be transformed by the dm target. Thus, targets like dm-linear
can expose crypto support of the underlying device, but targets like
dm-crypt can't. (dm-crypt could use inline crypto itself, though.)
A DM device's table can only be changed if the "new" inline encryption
capabilities are a (*not* necessarily strict) superset of the "old" inline
encryption capabilities. Attempts to make changes to the table that result
in some inline encryption capability becoming no longer supported will be
rejected.
For the sake of clarity, key eviction from underlying devices will be
handled in a future patch.
Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fix dm_table_supports_dax() and invert logic of both
iterate_devices_callout_fn so that all devices' DAX capabilities are
properly checked.
Fixes: 545ed20e6d ("dm: add infrastructure for DAX support")
Cc: stable@vger.kernel.org
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add two helper macros calculating the offset of bio in struct dm_io and
struct dm_target_io respectively.
Besides, simplify the front padding calculation in
dm_alloc_md_mempools().
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device. From that the gendisk can be trivially
accessed with an extra indirection, but it also allows to directly
look up all information related to partition remapping.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There wasn't ever a real need to log an error in the kernel log for
ioctls issued with insufficient permissions. Simply return an error
and if an admin/user is sufficiently motivated they can enable DM's
dynamic debugging to see an explanation for why the ioctls were
disallowed.
Reported-by: Nir Soffer <nsoffer@redhat.com>
Fixes: e980f62353 ("dm: don't allow ioctls to targets that don't map to whole devices")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
- Fix DM verity to skip verity work if IO completes with error while
system is shutting down.
- Add new DM multipath "IO affinity" path selector that maps IO
destined to a given path to a specific CPU based on user provided
mapping.
- Rename DM multipath path selector source files to have "dm-ps"
prefix.
- Add REQ_NOWAIT support to some other simple DM targets that don't
block in more elaborate ways waiting for IO.
- Export DM crypt's kcryptd workqueue via sysfs (WQ_SYSFS).
- Fix error return code in DM's target_message() if empty message is
received.
- A handful of other small cleanups.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl/iDJYTHHNuaXR6ZXJA
cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWsI2CACUk9PCCtOOHH1s//VeLjF86VUIsuxf
hNMfTgWT+arlHUIzl2Bp4c4Dq/T+hXTYlf+f5zGRAbBAd82eCqji5LTVvy/qaYt3
4gWSZnv/3VYHCx4MvV9UjtXQcSnS/ttDwyV0ZD6/NtYllQQobaCbyrE4p2tA1GIk
ql8ESRgXKK5Sio197Tm45UEkrhG0oUrEZ3riBaXeM9yOldLp1mLLVPGJeLcJDvpA
N7TDcM0owq/CMbmu5BkNv0xw7q/Vc9VQLGva8a15StxGjk1HY5/6KQWssCEkTkO7
HnIprATtWz2r0qgTcI+fOXndUmlqgQr1jhvYfJeuv45KbjoI5qubZvYr
=vHNj
-----END PGP SIGNATURE-----
Merge tag 'for-5.11/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Add DM verity support for signature verification with 2nd keyring
- Fix DM verity to skip verity work if IO completes with error while
system is shutting down
- Add new DM multipath "IO affinity" path selector that maps IO
destined to a given path to a specific CPU based on user provided
mapping
- Rename DM multipath path selector source files to have "dm-ps" prefix
- Add REQ_NOWAIT support to some other simple DM targets that don't
block in more elaborate ways waiting for IO
- Export DM crypt's kcryptd workqueue via sysfs (WQ_SYSFS)
- Fix error return code in DM's target_message() if empty message is
received
- A handful of other small cleanups
* tag 'for-5.11/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: simplify the return expression of load_mapping()
dm ebs: avoid double unlikely() notation when using IS_ERR()
dm verity: skip verity work if I/O error when system is shutting down
dm crypt: export sysfs of kcryptd workqueue
dm ioctl: fix error return code in target_message
dm crypt: Constify static crypt_iv_operations
dm: add support for REQ_NOWAIT to various targets
dm: rename multipath path selector source files to have "dm-ps" prefix
dm mpath: add IO affinity path selector
dm verity: Add support for signature verification with 2nd keyring
dm: remove unnecessary current->bio_list check when submitting split bio
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/Xec8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpoLbEACzXypgZWwMdfgRckA/Vt333rXHtbhUV+hK
2XP+P81iRvr9Esi31UPbRp82vrgcDO0cpI1QmQojS5U5TIQP88BfXptfRZZu48eb
wT5RDDNQ34HItqAh/yEuYsv9yUKcxeIrB99tBVvM+4UmQg9zTdIW3mg6PvCBdbhV
N38jI0tCF/PJatjfRuphT/nXonQLPWBlVDmZk06KZQFOwQe9ep1vUi1+nbiRPuo3
geFBpTh1Kp6Vl1B3n4RpECs6Y7I0RRuJdaH2sDizICla1/BW91F9fQwHimNnUxUq
e1Q1kMuh6ftcQGkYlHSYcPhuv6CvorldTZCO5arPxWpcwvxriTSMRPWAgUr5pEiF
fhiGhqeDu9e6vl9vS31wUD1B30hy+jFz9wyjRrDwJ3cPHH1JVBjTzvdX+cIh/1ku
IbIwUMteUtvUrzqAv/DzbGhedp7xWtOFaVo8j0QFYh9zkjd6b8yDOF/yztwX2gjY
Xt1cd+KpDSiN449ZRaoMI0sCJAxqzhMa6nsWlb0L7KuNyWKAbvKQBm9Rb47FLV9A
Vx70KC+zkFoyw23capvIahmQazerriUJ5PGe0lVm6ROgmIFdCpXTPDjnrvq/6RZ/
GEpD7gTW9atGJ7EuEE8686sAfKD5kneChWLX5EHXf0d0AG5Mr2lKsluiGp5LpPJg
Q1Xqs6xwww==
=zo4w
-----END PGP SIGNATURE-----
Merge tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
"Another series of killing more code than what is being added, again
thanks to Christoph's relentless cleanups and tech debt tackling.
This contains:
- blk-iocost improvements (Baolin Wang)
- part0 iostat fix (Jeffle Xu)
- Disable iopoll for split bios (Jeffle Xu)
- block tracepoint cleanups (Christoph Hellwig)
- Merging of struct block_device and hd_struct (Christoph Hellwig)
- Rework/cleanup of how block device sizes are updated (Christoph
Hellwig)
- Simplification of gendisk lookup and removal of block device
aliasing (Christoph Hellwig)
- Block device ioctl cleanups (Christoph Hellwig)
- Removal of bdget()/blkdev_get() as exported API (Christoph Hellwig)
- Disk change rework, avoid ->revalidate_disk() (Christoph Hellwig)
- sbitmap improvements (Pavel Begunkov)
- Hybrid polling fix (Pavel Begunkov)
- bvec iteration improvements (Pavel Begunkov)
- Zone revalidation fixes (Damien Le Moal)
- blk-throttle limit fix (Yu Kuai)
- Various little fixes"
* tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block: (126 commits)
blk-mq: fix msec comment from micro to milli seconds
blk-mq: update arg in comment of blk_mq_map_queue
blk-mq: add helper allocating tagset->tags
Revert "block: Fix a lockdep complaint triggered by request queue flushing"
nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class
blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
block: disable iopoll for split bio
block: Improve blk_revalidate_disk_zones() checks
sbitmap: simplify wrap check
sbitmap: replace CAS with atomic and
sbitmap: remove swap_lock
sbitmap: optimise sbitmap_deferred_clear()
blk-mq: skip hybrid polling if iopoll doesn't spin
blk-iocost: Factor out the base vrate change into a separate function
blk-iocost: Factor out the active iocgs' state check into a separate function
blk-iocost: Move the usage ratio calculation to the correct place
blk-iocost: Remove unnecessary advance declaration
blk-iocost: Fix some typos in comments
blktrace: fix up a kerneldoc comment
block: remove the request_queue to argument request based tracepoints
...
The depth-first splitting is introduced in commit 18a25da843 ("dm:
ensure bio submission follows a depth-first tree walk"), which is used
to fix the potential deadlock in case of the misordering handling of
bios caused by bio_list. There're two paths submitting split bios,
dm_wq_work() from worker thread and submit_bio() from application. Back
upon that time, dm_wq_work() thread calls __split_and_process_bio()
directly and thus will not trigger this issue since bio_list doesn't
exist here. So this issue will only be triggered from application
calling submit_bio(), and the fix has to check if current->bio_list is
non-NULL to distinguish this case.
However since commit 0c2915b8c6 ("dm: fix missing imposition of
queue_limits from dm_wq_work() thread"), dm_wq_work() thread calls
submit_bio_noacct() and thus also uses bio_list. Since then all entries
into __split_and_process_bio() are under protection of bio_list, and
thus the checking of current->bio_list when determinning if the
depth-first principle should be used, seems kind of nonsense. After all
the checking always succeeds now.
Fixes: 0c2915b8c6 ("dm: fix missing imposition of queue_limits from dm_wq_work() thread")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes sparse warnings:
drivers/md/dm.c:508:12: warning: context imbalance in 'dm_prepare_ioctl' - wrong count at exit
drivers/md/dm.c:543:13: warning: context imbalance in 'dm_unprepare_ioctl' - wrong count at exit
Fixes: 971888c469 ("dm: hold DM table for duration of ioctl rather than use blkdev_get")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Commit 882ec4e609 ("dm table: stack 'chunk_sectors' limit to account
for target-specific splitting") caused a couple regressions:
1) Using lcm_not_zero() when stacking chunk_sectors was a bug because
chunk_sectors must reflect the most limited of all devices in the
IO stack.
2) DM targets that set max_io_len but that do _not_ provide an
.iterate_devices method no longer had there IO split properly.
And commit 5091cdec56 ("dm: change max_io_len() to use
blk_max_size_offset()") also caused a regression where DM no longer
supported varied (per target) IO splitting. The implication being the
potential for severely reduced performance for IO stacks that use a DM
target like dm-cache to hide performance limitations of a slower
device (e.g. one that requires 4K IO splitting).
Coming full circle: Fix all these issues by discontinuing stacking
chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(),
add optional chunk_sectors override argument to blk_max_size_offset()
and update DM's max_io_len() to pass ti->max_io_len to its
blk_max_size_offset() call.
Passing in an optional chunk_sectors override to blk_max_size_offset()
allows for code reuse of block's centralized calculation for max IO
size based on provided offset and split boundary.
Fixes: 882ec4e609 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
Fixes: 5091cdec56 ("dm: change max_io_len() to use blk_max_size_offset()")
Cc: stable@vger.kernel.org
Reported-by: John Dorminy <jdorminy@redhat.com>
Reported-by: Bruce Johnston <bjohnsto@redhat.com>
Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: John Dorminy <jdorminy@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
The request_queue can trivially be derived from the bio.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The request_queue can trivially be derived from the bio.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We can just dereference the point in struct gendisk instead. Also
remove the now unused export.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use struct block_device to lookup partitions on a disk. This removes
all usage of struct hd_struct from the I/O path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allocate hd_struct together with struct block_device to pre-load
the lifetime rule changes in preparation of merging the two structures.
Note that part0 was previously embedded into struct gendisk, but is
a separate allocation now, and already points to the block_device instead
of the hd_struct. The lifetime of struct gendisk is still controlled by
the struct device embedded in the part0 hd_struct.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Get rid of the long-lasting struct block_device reference in
struct mapped_device. The only remaining user is the freeze code,
where we can trivially look up the block device at freeze time
and release the reference at thaw time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We don't really need the struct block_device to initialize a bio. So
switch from using bio_set_dev to manually setting up bi_disk (bi_partno
will always be zero and has been cleared by bio_init already).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Store the frozen superblock in struct block_device to avoid the awkward
interface that can return a sb only used a cookie, an ERR_PTR or NULL.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The dm_get_live_table() function makes RCU read lock so
dm_put_live_table() must be called even if dm_table map is not found.
Fixes: e76239a374 ("block: add a report_zones method")
Cc: stable@vger.kernel.org
Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Use set_capacity_and_notify to set the size of both the disk and block
device. This also gets the uevent notifications for the resize for free.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
fix bio splitting for bios that were deferred to the worker thread
due to a DM device being suspended.
- Remove DM core's special handling of NVMe devices now that block
core has internalized efficiencies drivers previously needed to
be concerned about (via now removed direct_make_request).
- Fix request-based DM to not bounce through indirect dm_submit_bio;
instead have block core make direct call to blk_mq_submit_bio().
- Various DM core cleanups to simplify and improve code.
- Update DM cryot to not use drivers that set
CRYPTO_ALG_ALLOCATES_MEMORY.
- Fix DM raid's raid1 and raid10 discard limits for the purposes of
linux-stable. But then remove DM raid's discard limits settings now
that MD raid can efficiently handle large discards.
- A couple small cleanups across various targets.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl+Fx1gTHHNuaXR6ZXJA
cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWk5iB/9pONYmtfQ5oBx4jg/PU8cVYYIfOtwS
ZtItFbw7T9bkHVZ8d4hDr5LTq898cADuRD5edlR82gDOcXkiJlb5PqU39RoOTVvF
Xz87sWzHdGAK7rdnCMAc2hiX3oQOje9o7NxGeGQ/uPaNU+U/vJS0AZtEAwltocBd
j9MGESddBC636Gzbg5C0c0frikXd0am6qp6SCYJNpP5I0G2beHk2YX5Jqt9c7zMk
8kyQend5b5RvkPNWTAjkVfWUsIjwYHh6MF48ZoGvD0X3lWjIBiwyxC0UX5hSXq63
kB+nqxbXcvQLEBtJuDZ2bjyvrwzCVLpmfgLgzxOOU8fI5Q2U0zpsPaa0
=6YDu
-----END PGP SIGNATURE-----
Merge tag 'for-5.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Improve DM core's bio splitting to use blk_max_size_offset(). Also
fix bio splitting for bios that were deferred to the worker thread
due to a DM device being suspended.
- Remove DM core's special handling of NVMe devices now that block core
has internalized efficiencies drivers previously needed to be
concerned about (via now removed direct_make_request).
- Fix request-based DM to not bounce through indirect dm_submit_bio;
instead have block core make direct call to blk_mq_submit_bio().
- Various DM core cleanups to simplify and improve code.
- Update DM cryot to not use drivers that set
CRYPTO_ALG_ALLOCATES_MEMORY.
- Fix DM raid's raid1 and raid10 discard limits for the purposes of
linux-stable. But then remove DM raid's discard limits settings now
that MD raid can efficiently handle large discards.
- A couple small cleanups across various targets.
* tag 'for-5.10/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix request-based DM to not bounce through indirect dm_submit_bio
dm: remove special-casing of bio-based immutable singleton target on NVMe
dm: export dm_copy_name_and_uuid
dm: fix comment in __dm_suspend()
dm: fold dm_process_bio() into dm_submit_bio()
dm: fix missing imposition of queue_limits from dm_wq_work() thread
dm snap persistent: simplify area_io()
dm thin metadata: Remove unused local variable when create thin and snap
dm raid: remove unnecessary discard limits for raid10
dm raid: fix discard limits for raid1 and raid10
dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
dm: use dm_table_get_device_name() where appropriate in targets
dm table: make 'struct dm_table' definition accessible to all of DM core
dm: eliminate need for start_io_acct() forward declaration
dm: simplify __process_abnormal_io()
dm: push use of on-stack flush_bio down to __send_empty_flush()
dm: optimize max_io_len() by inlining max_io_len_target_boundary()
dm: push md->immutable_target optimization down to __process_bio()
dm: change max_io_len() to use blk_max_size_offset()
dm table: stack 'chunk_sectors' limit to account for target-specific splitting
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+EWUgQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnoxEADCVSNBRkpV0OVkOEC3wf8EGhXhk01Jnjtl
u5Mg2V55hcgJ0thQxBV/V28XyqmsEBrmAVi0Yf8Vr9Qbq4Ze08Wae4ChS4rEOyh1
jTcGYWx5aJB3ChLvV/HI0nWQ3bkj03mMrL3SW8rhhf5DTyKHsVeTenpx42Qu/FKf
fRzi09FSr3Pjd0B+EX6gunwJnlyXQC5Fa4AA0GhnXJzAznANXxHkkcXu8a6Yw75x
e28CfhIBliORsK8sRHLoUnPpeTe1vtxCBhBMsE+gJAj9ZUOWMzvNFIPP4FvfawDy
6cCQo2m1azJ/IdZZCDjFUWyjh+wxdKMp+NNryEcoV+VlqIoc3n98rFwrSL+GIq5Z
WVwEwq+AcwoMCsD29Lu1ytL2PQ/RVqcJP5UheMrbL4vzefNfJFumQVZLIcX0k943
8dFL2QHL+H/hM9Dx5y5rjeiWkAlq75v4xPKVjh/DHb4nehddCqn/+DD5HDhNANHf
c1kmmEuYhvLpIaC4DHjE6DwLh8TPKahJjwsGuBOTr7D93NUQD+OOWsIhX6mNISIl
FFhP8cd0/ZZVV//9j+q+5B4BaJsT+ZtwmrelKFnPdwPSnh+3iu8zPRRWO+8P8fRC
YvddxuJAmE6BLmsAYrdz6Xb/wqfyV44cEiyivF0oBQfnhbtnXwDnkDWSfJD1bvCm
ZwfpDh2+Tg==
=LzyE
-----END PGP SIGNATURE-----
Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
- Series of merge handling cleanups (Baolin, Christoph)
- Series of blk-throttle fixes and cleanups (Baolin)
- Series cleaning up BDI, seperating the block device from the
backing_dev_info (Christoph)
- Removal of bdget() as a generic API (Christoph)
- Removal of blkdev_get() as a generic API (Christoph)
- Cleanup of is-partition checks (Christoph)
- Series reworking disk revalidation (Christoph)
- Series cleaning up bio flags (Christoph)
- bio crypt fixes (Eric)
- IO stats inflight tweak (Gabriel)
- blk-mq tags fixes (Hannes)
- Buffer invalidation fixes (Jan)
- Allow soft limits for zone append (Johannes)
- Shared tag set improvements (John, Kashyap)
- Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)
- DM no-wait support (Mike, Konstantin)
- Request allocation improvements (Ming)
- Allow md/dm/bcache to use IO stat helpers (Song)
- Series improving blk-iocost (Tejun)
- Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
Xianting, Yang, Yufen, yangerkun)
* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
block: fix uapi blkzoned.h comments
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk-mq: get rid of the dead flush handle code path
block: get rid of unnecessary local variable
block: fix comment and add lockdep assert
blk-mq: use helper function to test hw stopped
block: use helper function to test queue register
block: remove redundant mq check
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
percpu_ref: don't refer to ref->data if it isn't allocated
block: ratelimit handle_bad_sector() message
blk-throttle: Re-use the throtl_set_slice_end()
blk-throttle: Open code __throtl_de/enqueue_tg()
blk-throttle: Move service tree validation out of the throtl_rb_first()
blk-throttle: Move the list operation after list validation
blk-throttle: Fix IO hang for a corner case
blk-throttle: Avoid tracking latency if low limit is invalid
blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
block: Remove redundant 'return' statement
...
It is unnecessary to force request-based DM to call into bio-based
dm_submit_bio (via indirect disk->fops->submit_bio) only to have it then
call blk_mq_submit_bio().
Fix this by establishing a request-based DM block_device_operations
(dm_rq_blk_dops, which doesn't have .submit_bio) and update
dm_setup_md_queue() to set md->disk->fops to it for
DM_TYPE_REQUEST_BASED.
Remove DM_TYPE_REQUEST_BASED conditional in dm_submit_bio and unexport
blk_mq_submit_bio.
Fixes: c62b37d96b ("block: move ->make_request_fn to struct block_device_operations")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Since commit 5a6c35f9af ("block: remove direct_make_request") there
is no benefit to DM special-casing NVMe. Remove all code used to
establish DM_TYPE_NVME_BIO_BASED.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
bio_crypt_clone() assumes its gfp_mask argument always includes
__GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed.
However, bio_crypt_clone() might be called with GFP_ATOMIC via
setup_clone() in drivers/md/dm-rq.c, or with GFP_NOWAIT via
kcryptd_io_read() in drivers/md/dm-crypt.c.
Neither case is currently reachable with a bio that actually has an
encryption context. However, it's fragile to rely on this. Just make
bio_crypt_clone() able to fail, analogous to bio_integrity_clone().
Reported-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Satya Tangirala <satyat@google.com>
Cc: Satya Tangirala <satyat@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
dm_process_bio() is only called by dm_submit_bio(), there is no benefit
to keeping dm_process_bio() factored out, so fold it.
While at it, cleanup dm_submit_bio()'s DMF_BLOCK_IO_FOR_SUSPEND related
branching and expand scope of dm_get_live_table() rcu reference on map
via common 'out' label to dm_put_live_table().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If a DM device was suspended when bios were issued to it, those bios
would be deferred using queue_io(). Once the DM device was resumed
dm_process_bio() could be called by dm_wq_work() for original bio that
still needs splitting. dm_process_bio()'s check for current->bio_list
(meaning call chain is within ->submit_bio) as a prerequisite for
calling blk_queue_split() for "abnormal IO" would result in
dm_process_bio() never imposing corresponding queue_limits
(e.g. discard_granularity, discard_max_bytes, etc).
Fix this by always having dm_wq_work() resubmit deferred bios using
submit_bio_noacct().
Side-effect is blk_queue_split() is always called for "abnormal IO" from
->submit_bio, be it from application thread or dm_wq_work() workqueue,
so proper bio splitting and depth-first bio submission is performed.
For sake of clarity, remove current->bio_list check before call to
blk_queue_split().
Also, remove dm_wq_work()'s use of dm_{get,put}_live_table() -- no
longer needed since IO will be reissued in terms of ->submit_bio.
And rename bio variable from 'c' to 'bio'.
Fixes: cf9c378655 ("dm: fix comment in dm_process_bio()")
Reported-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Move 'struct dm_table' definition from dm-table.c to dm-core.h and
update DM core to access its members directly.
Helps optimize max_io_len() and other methods slightly.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Only call bio_op() once in switch statement. Also remove the
excessive factoring out to one line functions.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Saves redundant dm_target_offset() math.
Also, reverse argument order for max_io_len() to be consistent with
other similar functions.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Using blk_max_size_offset() enables DM core's splitting to impose
ti->max_io_len (via q->limits.chunk_sectors) and also fallback to
respecting q->limits.max_sectors if chunk_sectors isn't set.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
DM depends on these block 5.10 commits:
22ada802ed block: use lcm_not_zero() when stacking chunk_sectors
07d098e6bb block: allow 'chunk_sectors' to be non-power-of-2
021a24460d block: add QUEUE_FLAG_NOWAIT
6abc49468e dm: add support for REQ_NOWAIT and enable it for linear target
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Add DM target feature flag DM_TARGET_NOWAIT which advertises that
target works with REQ_NOWAIT bios.
Add dm_table_supports_nowait() and update dm_table_set_restrictions()
to set/clear QUEUE_FLAG_NOWAIT accordingly.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Refer to the correct function (->submit_bio instead of ->queue_bio).
Also, add details about why using blk_queue_split() isn't needed for
dm_wq_work()'s call to dm_process_bio().
Fixes: c62b37d96b ("block: move ->make_request_fn to struct block_device_operations")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
dm_queue_split() is removed because __split_and_process_bio() _must_
handle splitting bios to ensure proper bio submission and completion
ordering as a bio is split.
Otherwise, multiple recursive calls to ->submit_bio will cause multiple
split bios to be allocated from the same ->bio_split mempool at the same
time. This would result in deadlock in low memory conditions because no
progress could be made (only one bio is available in ->bio_split
mempool).
This fix has been verified to still fix the loss of performance, due
to excess splitting, that commit 120c9257f5 provided.
Fixes: 120c9257f5 ("Revert "dm: always call blk_queue_split() in dm_process_bio()"")
Cc: stable@vger.kernel.org # 5.0+, requires custom backport due to 5.9 changes
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
A recent fix to the dm_dax_supported() flow uncovered a latent bug. When
dm_get_live_table() fails it is still required to drop the
srcu_read_lock(). Without this change the lvm2 test-suite triggers this
warning:
# lvm2-testsuite --only pvmove-abort-all.sh
WARNING: lock held when returning to user space!
5.9.0-rc5+ #251 Tainted: G OE
------------------------------------------------
lvm/1318 is leaving the kernel with locks still held!
1 lock held by lvm/1318:
#0: ffff9372abb5a340 (&md->io_barrier){....}-{0:0}, at: dm_get_live_table+0x5/0xb0 [dm_mod]
...and later on this hang signature:
INFO: task lvm:1344 blocked for more than 122 seconds.
Tainted: G OE 5.9.0-rc5+ #251
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:lvm state:D stack: 0 pid: 1344 ppid: 1 flags:0x00004000
Call Trace:
__schedule+0x45f/0xa80
? finish_task_switch+0x249/0x2c0
? wait_for_completion+0x86/0x110
schedule+0x5f/0xd0
schedule_timeout+0x212/0x2a0
? __schedule+0x467/0xa80
? wait_for_completion+0x86/0x110
wait_for_completion+0xb0/0x110
__synchronize_srcu+0xd1/0x160
? __bpf_trace_rcu_utilization+0x10/0x10
__dm_suspend+0x6d/0x210 [dm_mod]
dm_suspend+0xf6/0x140 [dm_mod]
Fixes: 7bf7eac8d6 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: <stable@vger.kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Reported-by: Adrian Huang <ahuang12@lenovo.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Link: https://lore.kernel.org/r/160045867590.25663.7548541079217827340.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Two different callers use two different mutexes for updating the
block device size, which obviously doesn't help to actually protect
against concurrent updates from the different callers. In addition
one of the locks, bd_mutex is rather prone to deadlocks with other
parts of the block stack that use it for high level synchronization.
Switch to using a new spinlock protecting just the size updates, as
that is all we need, and make sure everyone does the update through
the proper helper.
This fixes a bug reported with the nvme revalidating disks during a
hot removal operation, which can currently deadlock on bd_mutex.
Reported-by: Xianting Tian <xianting_tian@126.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio-based code so that it follows patterns established by
request-based code.
- Request-based DM core improvement to eliminate unnecessary call to
blk_mq_queue_stopped().
- Add "panic_on_corruption" error handling mode to DM verity target.
- DM bufio fix to to perform buffer cleanup from a workqueue rather
than wait for IO in reclaim context from shrinker.
- DM crypt improvement to optionally avoid async processing via
workqueues for reads and/or writes -- via "no_read_workqueue" and
"no_write_workqueue" features. This more direct IO processing
improves latency and throughput with faster storage. Avoiding
workqueue IO submission for writes (DM_CRYPT_NO_WRITE_WORKQUEUE) is
a requirement for adding zoned block device support to DM crypt.
- Add zoned block device support to DM crypt. Makes use of
DM_CRYPT_NO_WRITE_WORKQUEUE and a new optional feature
(DM_CRYPT_WRITE_INLINE) that allows write completion to wait for
encryption to complete. This allows write ordering to be preserved,
which is needed for zoned block devices.
- Fix DM ebs target's check for REQ_OP_FLUSH.
- Fix DM core's report zones support to not report more zones than
were requested.
- A few small compiler warning fixes.
- DM dust improvements to return output directly to the user rather
than require they scrape the system log for output.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl8tdOQTHHNuaXR6ZXJA
cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWvDlB/sF8svagDqeqs27xTxCiUPykD29cMmS
OGPr0Mp/BntZOBpSaTPM9s5XucP3WJhPsxet5qeoyM3OViSFx+O55PqPjn8C65y0
eGMa4zknd9eO1933+ijmyQu6VNr4sf/6nusX4xSGqv00UR22dJ+3pHtfN9ANDXYX
AAYA0Ve6UuOwAbGUCnRGI/2780aYY0B8Ok+cF21CskqryF+RpmbZ6BsR07+Hk4cy
LX5EaHUqezW12cibLq2f0l7TLLJ86OscvqyU9lGVIxiV57e2i5c2S1HvhKZu+Wn3
6CUmlOhGI0viCKgM1ArekZ+zOw9ROIaAKKPzC5mspqx9yuuCqdY8k8xV
=X3tt
-----END PGP SIGNATURE-----
Merge tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- DM multipath locking fixes around m->flags tests and improvements to
bio-based code so that it follows patterns established by
request-based code.
- Request-based DM core improvement to eliminate unnecessary call to
blk_mq_queue_stopped().
- Add "panic_on_corruption" error handling mode to DM verity target.
- DM bufio fix to to perform buffer cleanup from a workqueue rather
than wait for IO in reclaim context from shrinker.
- DM crypt improvement to optionally avoid async processing via
workqueues for reads and/or writes -- via "no_read_workqueue" and
"no_write_workqueue" features. This more direct IO processing
improves latency and throughput with faster storage. Avoiding
workqueue IO submission for writes (DM_CRYPT_NO_WRITE_WORKQUEUE) is a
requirement for adding zoned block device support to DM crypt.
- Add zoned block device support to DM crypt. Makes use of
DM_CRYPT_NO_WRITE_WORKQUEUE and a new optional feature
(DM_CRYPT_WRITE_INLINE) that allows write completion to wait for
encryption to complete. This allows write ordering to be preserved,
which is needed for zoned block devices.
- Fix DM ebs target's check for REQ_OP_FLUSH.
- Fix DM core's report zones support to not report more zones than were
requested.
- A few small compiler warning fixes.
- DM dust improvements to return output directly to the user rather
than require they scrape the system log for output.
* tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: don't call report zones for more than the user requested
dm ebs: Fix incorrect checking for REQ_OP_FLUSH
dm init: Set file local variable static
dm ioctl: Fix compilation warning
dm raid: Remove empty if statement
dm verity: Fix compilation warning
dm crypt: Enable zoned block device support
dm crypt: add flags to optionally bypass kcryptd workqueues
dm bufio: do buffer cleanup from a workqueue
dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue()
dm dust: add interface to list all badblocks
dm dust: report some message results directly back to user
dm verity: add "panic_on_corruption" error handling mode
dm mpath: use double checked locking in fast path
dm mpath: rename current_pgpath to pgpath in multipath_prepare_ioctl
dm mpath: rework __map_bio()
dm mpath: factor out multipath_queue_bio
dm mpath: push locking down to must_push_back_rq()
dm mpath: take m->lock spinlock when testing QUEUE_IF_NO_PATH
dm mpath: changes from initial m->flags locking audit
Don't call report zones for more zones than the user actually requested,
otherwise this can lead to out-of-bounds accesses in the callback
functions.
Such a situation can happen if the target's ->report_zones() callback
function returns 0 because we've reached the end of the target and then
restart the report zones on the second target.
We're again calling into ->report_zones() and ultimately into the user
supplied callback function but when we're not subtracting the number of
zones already processed this may lead to out-of-bounds accesses in the
user callbacks.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Fixes: d41003513e ("block: rework zone reporting")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl8m7YwQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpt+dEAC7a0HYuX2OrkyawBnsgd1QQR/soC7surec
yDDa7SMM8cOq3935bfzcYHV9FWJszEGIknchiGb9R3/T+vmSohbvDsM5zgwya9u/
FHUIuTq324I6JWXKl30k4rwjiX9wQeMt+WZ5gC8KJYCWA296i2IpJwd0A45aaKuS
x4bTjxqknE+fD4gQiMUSt+bmuOUAp81fEku3EPapCRYDPAj8f5uoY7R2arT/POwB
b+s+AtXqzBymIqx1z0sZ/XcdZKmDuhdurGCWu7BfJFIzw5kQ2Qe3W8rUmrQ3pGut
8a21YfilhUFiBv+B4wptfrzJuzU6Ps0BXHCnBsQjzvXwq5uFcZH495mM/4E4OJvh
SbjL2K4iFj+O1ngFkukG/F8tdEM1zKBYy2ZEkGoWKUpyQanbAaGI6QKKJA+DCdBi
yPEb7yRAa5KfLqMiocm1qCEO1I56HRiNHaJVMqCPOZxLmpXj19Fs71yIRplP1Trv
GGXdWZsccjuY6OljoXWdEfnxAr5zBsO3Yf2yFT95AD+egtGsU1oOzlqAaU1mtflw
ABo452pvh6FFpxGXqz6oK4VqY4Et7WgXOiljA4yIGoPpG/08L1Yle4eVc2EE01Jb
+BL49xNJVeUhGFrvUjPGl9kVMeLmubPFbmgrtipW+VRg9W8+Yirw7DPP6K+gbPAR
RzAUdZFbWw==
=abJG
-----END PGP SIGNATURE-----
Merge tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
"Good amount of cleanups and tech debt removals in here, and as a
result, the diffstat shows a nice net reduction in code.
- Softirq completion cleanups (Christoph)
- Stop using ->queuedata (Christoph)
- Cleanup bd claiming (Christoph)
- Use check_events, moving away from the legacy media change
(Christoph)
- Use inode i_blkbits consistently (Christoph)
- Remove old unused writeback congestion bits (Christoph)
- Cleanup/unify submission path (Christoph)
- Use bio_uninit consistently, instead of bio_disassociate_blkg
(Christoph)
- sbitmap cleared bits handling (John)
- Request merging blktrace event addition (Jan)
- sysfs add/remove race fixes (Luis)
- blk-mq tag fixes/optimizations (Ming)
- Duplicate words in comments (Randy)
- Flush deferral cleanup (Yufen)
- IO context locking/retry fixes (John)
- struct_size() usage (Gustavo)
- blk-iocost fixes (Chengming)
- blk-cgroup IO stats fixes (Boris)
- Various little fixes"
* tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits)
block: blk-timeout: delete duplicated word
block: blk-mq-sched: delete duplicated word
block: blk-mq: delete duplicated word
block: genhd: delete duplicated words
block: elevator: delete duplicated word and fix typos
block: bio: delete duplicated words
block: bfq-iosched: fix duplicated word
iocost_monitor: start from the oldest usage index
iocost: Fix check condition of iocg abs_vdebt
block: Remove callback typedefs for blk_mq_ops
block: Use non _rcu version of list functions for tag_set_list
blk-cgroup: show global disk stats in root cgroup io.stat
blk-cgroup: make iostat functions visible to stat printing
block: improve discard bio alignment in __blkdev_issue_discard()
block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers
block: defer flush request no matter whether we have elevator
block: make blk_timeout_init() static
block: remove retry loop in ioc_release_fn()
block: remove unnecessary ioc nested locking
block: integrate bd_start_claiming into __blkdev_get
...
Commit adc0daad36 ("dm: report suspended
device during destroy") broke integrity recalculation.
The problem is dm_suspended() returns true not only during suspend,
but also during resume. So this race condition could occur:
1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work)
2. integrity_recalc (&ic->recalc_work) preempts the current thread
3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret;
4. integrity_recalc exits and no recalculating is done.
To fix this race condition, add a function dm_post_suspending that is
only true during the postsuspend phase and use it instead of
dm_suspended().
Signed-off-by: Mikulas Patocka <mpatocka redhat com>
Fixes: adc0daad36 ("dm: report suspended device during destroy")
Cc: stable vger kernel org # v4.18+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Except for pktdvd, the only places setting congested bits are file
systems that allocate their own backing_dev_info structures. And
pktdvd is a deprecated driver that isn't useful in stack setup
either. So remove the dead congested_fn stacking infrastructure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Song Liu <song@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
[axboe: fixup unused variables in bcache/request.c]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We never set any congested bits in the group writeback instances of it.
And for the simpler bdi-wide case a simple scalar field is all that
that is needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
kobject_uevent may allocate memory and it may be called while there are dm
devices suspended. The allocation may recurse into a suspended device,
causing a deadlock. We must set the noio flag when sending a uevent.
The observed deadlock was reported here:
https://www.redhat.com/archives/dm-devel/2020-March/msg00025.html
Reported-by: Khazhismel Kumykov <khazhy@google.com>
Reported-by: Tahsin Erdogan <tahsin@google.com>
Reported-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
bio_uninit is the proper API to clean up a BIO that has been allocated
on stack or inside a structure that doesn't come from the BIO allocator.
Switch dm to use that instead of bio_disassociate_blkg, which really is
an implementation detail. Note that the bio_uninit calls are also moved
to the two callers of __send_empty_flush, so that they better pair with
the bio_init calls used to initialize them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Given request-based DM now uses blk-mq's blk_mq_queue_inflight() to
determine if outstanding IO has completed (and DM has no control over
the blk-mq state machine used to track outstanding IO) it is unsafe to
wakeup waiter (dm_wait_for_completion) before blk-mq has cleared a
request's state bits (e.g. MQ_RQ_IN_FLIGHT or MQ_RQ_COMPLETE). As
such dm_wait_for_completion() could be left to wait indefinitely if no
other requests complete.
Fix this by eliminating request-based DM's use of waitqueue to wait
for blk-mq requests to complete in dm_wait_for_completion.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Depends-on: 3c94d83cb3 ("blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Now that submit_bio_noacct has a decent blk-mq fast path there is no
more need for this bypass.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The make_request_fn is a little weird in that it sits directly in
struct request_queue instead of an operation vector. Replace it with
a block_device_operations method called submit_bio (which describes much
better what it does). Also remove the request_queue argument to it, as
the queue can be derived pretty trivially from the bio.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The queue can be trivially derived from the bio, so pass one less
argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of setting up the queuedata as well just use one private data
field.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio_uninit is the proper API to clean up a BIO that has been allocated
on stack or inside a structure that doesn't come from the BIO allocator.
Switch dm to use that instead of bio_disassociate_blkg, which really is
an implementation detail. Note that the bio_uninit calls are also moved
to the two callers of __send_empty_flush, so that they better pair with
the bio_init calls used to initialize them.
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Naohiro reported that issuing zone-append bios to a zoned block device
underneath a dm-linear device does not work as expected.
This because we forgot to reverse-map the sector the device wrote to the
original bio.
For zone-append bios, get the offset in the zone of the written sector
from the clone bio and add that to the original bio's sector position.
Fixes: 0512a75b98 ("block: Introduce REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org
Reported-by: Naohiro Aota <Naohiro.Aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
version 2 feature that adds support for pairing regular block
devices with a zoned device to ease performance impact associated
with finite random zones of zoned device. Changes came in 3
batches: first prepared for and then added the ability to pair a
single regular block device, second was a batch of fixes to improve
zoned's reclaim heuristic, third removed the limitation of only
adding a single additional regular block device to allow many
devices. Testing has shown linear scaling as more devices are
added.
- Add new emulated block size (ebs) target that emulates a smaller
logical_block_size than a block device supports. Primary use-case
is to emulate "512e" devices that have 512 byte logical_block_size
and 4KB physical_block_size. This is useful to some legacy
applications otherwise wouldn't be ablee to be used on 4K devices
because they depend on issuing IO in 512 byte granularity.
- Add discard interfaces to DM bufio. First consumer of the interface
is the dm-ebs target that makes heavy use of dm-bufio.
- Fix DM crypt's block queue_limits stacking to not truncate
logic_block_size.
- Add Documentation for DM integrity's status line.
- Switch DMDEBUG from a compile time config option to instead use
dynamic debug via pr_debug.
- Fix DM multipath target's hueristic for how it manages
"queue_if_no_path" state internally. DM multipath now avoids
disabling "queue_if_no_path" unless it is actually needed (e.g. in
response to configure timeout or explicit "fail_if_no_path"
message). This fixes reports of spurious -EIO being reported back
to userspace application during fault tolerance testing with an NVMe
backend. Added various dynamic DMDEBUG messages to assist with
debugging queue_if_no_path in the future.
- Add a new DM multipath "Historical Service Time" Path Selector.
- Fix DM multipath's dm_blk_ioctl() to switch paths on IO error.
- Improve DM writecache target performance by using explicit
cache flushing for target's single-threaded usecase and a small
cleanup to remove unnecessary test in persistent_memory_claim.
- Other small cleanups in DM core, dm-persistent-data, and DM integrity.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl7alrgTHHNuaXR6ZXJA
cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWl42B/9sBd+j60emy4Bliu/f3pd7SEkFrSXv
K2jXicRFx4E5kO0aLK+fX65cOiq2vvLsDh8c++0TLXcD9q7oK0qxK9c8TCPq29Cx
W2J2dwdjyyqqbr3/FZHYYM9KLOl5rsxJLygXwhJhQ2Gny44L7nVACrAXNzXIHJ3r
f8xr+GLdF/jz7WLj8bwEDo3Bf8wvxDvl2ijqj7EceOhTutNE8xHQ6UcTPqTtozJy
sNM8UQNk1L43DBAvXfrKZ+yQ5DYAKdXKJpV9C8qv5DEGbaEikuMrHddgO4KlDdp4
VjPk9GSfPwGcJ4ecN8vgecZVGvh52ZU7OZ8qey/q0zqps74jeHTZQQyM
=jRun
-----END PGP SIGNATURE-----
Merge tag 'for-5.8/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- The largest change for this cycle is the DM zoned target's metadata
version 2 feature that adds support for pairing regular block devices
with a zoned device to ease the performance impact associated with
finite random zones of zoned device.
The changes came in three batches: the first prepared for and then
added the ability to pair a single regular block device, the second
was a batch of fixes to improve zoned's reclaim heuristic, and the
third removed the limitation of only adding a single additional
regular block device to allow many devices.
Testing has shown linear scaling as more devices are added.
- Add new emulated block size (ebs) target that emulates a smaller
logical_block_size than a block device supports
The primary use-case is to emulate "512e" devices that have 512 byte
logical_block_size and 4KB physical_block_size. This is useful to
some legacy applications that otherwise wouldn't be able to be used
on 4K devices because they depend on issuing IO in 512 byte
granularity.
- Add discard interfaces to DM bufio. First consumer of the interface
is the dm-ebs target that makes heavy use of dm-bufio.
- Fix DM crypt's block queue_limits stacking to not truncate
logic_block_size.
- Add Documentation for DM integrity's status line.
- Switch DMDEBUG from a compile time config option to instead use
dynamic debug via pr_debug.
- Fix DM multipath target's hueristic for how it manages
"queue_if_no_path" state internally.
DM multipath now avoids disabling "queue_if_no_path" unless it is
actually needed (e.g. in response to configure timeout or explicit
"fail_if_no_path" message).
This fixes reports of spurious -EIO being reported back to userspace
application during fault tolerance testing with an NVMe backend.
Added various dynamic DMDEBUG messages to assist with debugging
queue_if_no_path in the future.
- Add a new DM multipath "Historical Service Time" Path Selector.
- Fix DM multipath's dm_blk_ioctl() to switch paths on IO error.
- Improve DM writecache target performance by using explicit cache
flushing for target's single-threaded usecase and a small cleanup to
remove unnecessary test in persistent_memory_claim.
- Other small cleanups in DM core, dm-persistent-data, and DM
integrity.
* tag 'for-5.8/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits)
dm crypt: avoid truncating the logical block size
dm mpath: add DM device name to Failing/Reinstating path log messages
dm mpath: enhance queue_if_no_path debugging
dm mpath: restrict queue_if_no_path state machine
dm mpath: simplify __must_push_back
dm zoned: check superblock location
dm zoned: prefer full zones for reclaim
dm zoned: select reclaim zone based on device index
dm zoned: allocate zone by device index
dm zoned: support arbitrary number of devices
dm zoned: move random and sequential zones into struct dmz_dev
dm zoned: per-device reclaim
dm zoned: add metadata pointer to struct dmz_dev
dm zoned: add device pointer to struct dm_zone
dm zoned: allocate temporary superblock for tertiary devices
dm zoned: convert to xarray
dm zoned: add a 'reserved' zone flag
dm zoned: improve logging messages for reclaim
dm zoned: avoid unnecessary device recalulation for secondary superblock
dm zoned: add debugging message for reading superblocks
...
Switch dm to use the nicer bio accounting helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that DMDEBUG uses pr_debug and DMDEBUG_LIMIT uses
pr_debug_ratelimited cleanup DM's 2 direct pr_debug callers to use
them to get the benefit of consistent DM_FMT formatting of debugging
messages.
While doing so, dm-mpath.c:dm_report_EIO() was switched over to using
DMDEBUG_LIMIT due to the potential for error handling floods in the IO
completion path.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
blk_mq_make_request currently needs to grab an q_usage_counter
reference when allocating a request. This is because the block layer
grabs one before calling blk_mq_make_request, but also releases it as
soon as blk_mq_make_request returns. Remove the blk_queue_exit call
after blk_mq_make_request returns, and instead let it consume the
reference. This works perfectly fine for the block layer caller, just
device mapper needs an extra reference as the old problem still
persists there. Open code blk_queue_enter_live in device mapper,
as there should be no other callers and this allows better documenting
why we do a non-try get.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The HST path selector needs this information to perform path
prediction. For request-based mpath, struct request's io_start_time_ns
is used, while for bio-based, use the start_time stored in dm_io.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
We must have some way of letting a storage device driver know what
encryption context it should use for en/decrypting a request. However,
it's the upper layers (like the filesystem/fscrypt) that know about and
manages encryption contexts. As such, when the upper layer submits a bio
to the block layer, and this bio eventually reaches a device driver with
support for inline encryption, the device driver will need to have been
told the encryption context for that bio.
We want to communicate the encryption context from the upper layer to the
storage device along with the bio, when the bio is submitted to the block
layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
represent an encryption context (note that we can't use the bi_private
field in struct bio to do this because that field does not function to pass
information across layers in the storage stack). We also introduce various
functions to manipulate the bio_crypt_ctx and make the bio/request merging
logic aware of the bio_crypt_ctx.
We also make changes to blk-mq to make it handle bios with encryption
contexts. blk-mq can merge many bios into the same request. These bios need
to have contiguous data unit numbers (the necessary changes to blk-merge
are also made to ensure this) - as such, it suffices to keep the data unit
number of just the first bio, since that's all a storage driver needs to
infer the data unit number to use for each data block in each bio in a
request. blk-mq keeps track of the encryption context to be used for all
the bios in a request with the request's rq_crypt_ctx. When the first bio
is added to an empty request, blk-mq will program the encryption context
of that bio into the request_queue's keyslot manager, and store the
returned keyslot in the request's rq_crypt_ctx. All the functions to
operate on encryption contexts are in blk-crypto.c.
Upper layers only need to call bio_crypt_set_ctx with the encryption key,
algorithm and data_unit_num; they don't have to worry about getting a
keyslot for each encryption context, as blk-mq/blk-crypto handles that.
Blk-crypto also makes it possible for request-based layered devices like
dm-rq to make use of inline encryption hardware by cloning the
rq_crypt_ctx and programming a keyslot in the new request_queue when
necessary.
Note that any user of the block layer can submit bios with an
encryption context, such as filesystems, device-mapper targets, etc.
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Call blk_mq_make_request when no ->make_request_fn is set. This is
safe now that blk_alloc_queue always sets up the pointer for make_request
based drivers. This avoids an indirect call in the blk-mq driver I/O
fast path, which is rather expensive due to spectre mitigations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Add support for region alignment configuration and enforcement to
fix compatibility across architectures and PowerPC page size
configurations.
- Introduce 'zero_page_range' as a dax operation. This facilitates
filesystem-dax operation without a block-device.
- Introduce phys_to_target_node() to facilitate drivers that want to
know resulting numa node if a given reserved address range was
onlined.
- Advertise a persistence-domain for of_pmem and papr_scm. The
persistence domain indicates where cpu-store cycles need to reach in
the platform-memory subsystem before the platform will consider them
power-fail protected.
- Promote numa_map_to_online_node() to a cross-kernel generic facility.
- Save x86 numa information to allow for node-id lookups for reserved
memory ranges, deploy that capability for the e820-pmem driver.
- Pick up some miscellaneous minor fixes, that missed v5.6-final,
including a some smatch reports in the ioctl path and some unit test
compilation fixups.
- Fixup some flexible-array declarations.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAl6LtIAACgkQHtKRamZ9
iAIwRA/8CLVVuQpgHQ1tqK4h8CZPrISFXh7wy7uhocEU2xrDh6iGVnLztmoLRr2k
5f8T9lRzreSAwIVL5DbGqP1pFncqIt9VMnKsFlaPMBGCBNR+hURY0iBCNjIT+jiq
BOzLd52MR2rqJxeXGTMUbWrBrbmuj4mZPdmGVuFFe7GFRpoaVpCgOo+296eWa/ot
gIOFUTonZY7STYjNvDok0TXCmiCFuJb+P+y5ldfCPShHvZhTiaF53jircja8vAjO
G5dt8ixBKUK0rXRc4SEQsQhAZNcAFHb6Gy5lg4C2QzhTF374xTc9usJZNWbIE9iM
5mipBYvjVuoY+XaCNZDkaRcJIy/jqB15O6l3QIWbZLGaK9m95YPp9LmkPFwd3JpO
e3rO24ML471DxqB9iWIiJCNcBBocLOlnd6qAQTpppWDpGNbudwXvfsmKHmKIScSE
x+IDCdscLmmm+WG2dLmLraWOVPu42xZFccoQCi4M3TTqfeB9pZ9XckFQ37zX62zG
5t+7Ek+t1W4QVt/JQYVKH03XT15sqUpVknvx0Hl4Y5TtbDOkFLkO8RN0/HyExDef
7iegS35kqTsM4EfZQ+9juKbI2JBAjHANcbj0V4dogqaRj6vr3akumBzUtuYqAofv
qU3s9skmLsEemOJC+ns2PT8vl5dyIoeDfH0r2XvGWxYqolMqJpA=
=sY4N
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm and dax updates from Dan Williams:
"There were multiple touches outside of drivers/nvdimm/ this round to
add cross arch compatibility to the devm_memremap_pages() interface,
enhance numa information for persistent memory ranges, and add a
zero_page_range() dax operation.
This cycle I switched from the patchwork api to Konstantin's b4 script
for collecting tags (from x86, PowerPC, filesystem, and device-mapper
folks), and everything looks to have gone ok there. This has all
appeared in -next with no reported issues.
Summary:
- Add support for region alignment configuration and enforcement to
fix compatibility across architectures and PowerPC page size
configurations.
- Introduce 'zero_page_range' as a dax operation. This facilitates
filesystem-dax operation without a block-device.
- Introduce phys_to_target_node() to facilitate drivers that want to
know resulting numa node if a given reserved address range was
onlined.
- Advertise a persistence-domain for of_pmem and papr_scm. The
persistence domain indicates where cpu-store cycles need to reach
in the platform-memory subsystem before the platform will consider
them power-fail protected.
- Promote numa_map_to_online_node() to a cross-kernel generic
facility.
- Save x86 numa information to allow for node-id lookups for reserved
memory ranges, deploy that capability for the e820-pmem driver.
- Pick up some miscellaneous minor fixes, that missed v5.6-final,
including a some smatch reports in the ioctl path and some unit
test compilation fixups.
- Fixup some flexible-array declarations"
* tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits)
dax: Move mandatory ->zero_page_range() check in alloc_dax()
dax,iomap: Add helper dax_iomap_zero() to zero a range
dax: Use new dax zero page method for zeroing a page
dm,dax: Add dax zero_page_range operation
s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
dax, pmem: Add a dax operation zero_page_range
pmem: Add functions for reading/writing page to/from pmem
libnvdimm: Update persistence domain value for of_pmem and papr_scm device
tools/test/nvdimm: Fix out of tree build
libnvdimm/region: Fix build error
libnvdimm/region: Replace zero-length array with flexible-array member
libnvdimm/label: Replace zero-length array with flexible-array member
ACPI: NFIT: Replace zero-length array with flexible-array member
libnvdimm/region: Introduce an 'align' attribute
libnvdimm/region: Introduce NDD_LABELING
libnvdimm/namespace: Enforce memremap_compat_align()
libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
libnvdimm: Out of bounds read in __nd_ioctl()
acpi/nfit: improve bounds checking for 'func'
mm/memremap_pages: Introduce memremap_compat_align()
...
This reverts commit effd58c95f.
blk_queue_split() is causing excessive IO splitting -- because
blk_max_size_offset() depends on 'chunk_sectors' limit being set and
if it isn't (as is the case for DM targets!) it falls back to
splitting on a 'max_sectors' boundary regardless of offset.
"Fix" this by reverting back to _not_ using blk_queue_split() in
dm_process_bio() for normal IO (reads and writes). Long-term fix is
still TBD but it should focus on training blk_max_size_offset() to
call into a DM provided hook (to call DM's max_io_len()).
Test results from simple misaligned IO test on 4-way dm-striped device
with chunksize of 128K and stripesize of 512K:
xfs_io -d -c 'pread -b 2m 224s 4072s' /dev/mapper/stripe_dev
before this revert:
253,0 21 1 0.000000000 2206 Q R 224 + 4072 [xfs_io]
253,0 21 2 0.000008267 2206 X R 224 / 480 [xfs_io]
253,0 21 3 0.000010530 2206 X R 224 / 256 [xfs_io]
253,0 21 4 0.000027022 2206 X R 480 / 736 [xfs_io]
253,0 21 5 0.000028751 2206 X R 480 / 512 [xfs_io]
253,0 21 6 0.000033323 2206 X R 736 / 992 [xfs_io]
253,0 21 7 0.000035130 2206 X R 736 / 768 [xfs_io]
253,0 21 8 0.000039146 2206 X R 992 / 1248 [xfs_io]
253,0 21 9 0.000040734 2206 X R 992 / 1024 [xfs_io]
253,0 21 10 0.000044694 2206 X R 1248 / 1504 [xfs_io]
253,0 21 11 0.000046422 2206 X R 1248 / 1280 [xfs_io]
253,0 21 12 0.000050376 2206 X R 1504 / 1760 [xfs_io]
253,0 21 13 0.000051974 2206 X R 1504 / 1536 [xfs_io]
253,0 21 14 0.000055881 2206 X R 1760 / 2016 [xfs_io]
253,0 21 15 0.000057462 2206 X R 1760 / 1792 [xfs_io]
253,0 21 16 0.000060999 2206 X R 2016 / 2272 [xfs_io]
253,0 21 17 0.000062489 2206 X R 2016 / 2048 [xfs_io]
253,0 21 18 0.000066133 2206 X R 2272 / 2528 [xfs_io]
253,0 21 19 0.000067507 2206 X R 2272 / 2304 [xfs_io]
253,0 21 20 0.000071136 2206 X R 2528 / 2784 [xfs_io]
253,0 21 21 0.000072764 2206 X R 2528 / 2560 [xfs_io]
253,0 21 22 0.000076185 2206 X R 2784 / 3040 [xfs_io]
253,0 21 23 0.000077486 2206 X R 2784 / 2816 [xfs_io]
253,0 21 24 0.000080885 2206 X R 3040 / 3296 [xfs_io]
253,0 21 25 0.000082316 2206 X R 3040 / 3072 [xfs_io]
253,0 21 26 0.000085788 2206 X R 3296 / 3552 [xfs_io]
253,0 21 27 0.000087096 2206 X R 3296 / 3328 [xfs_io]
253,0 21 28 0.000093469 2206 X R 3552 / 3808 [xfs_io]
253,0 21 29 0.000095186 2206 X R 3552 / 3584 [xfs_io]
253,0 21 30 0.000099228 2206 X R 3808 / 4064 [xfs_io]
253,0 21 31 0.000101062 2206 X R 3808 / 3840 [xfs_io]
253,0 21 32 0.000104956 2206 X R 4064 / 4096 [xfs_io]
253,0 21 33 0.001138823 0 C R 4096 + 200 [0]
after this revert:
253,0 18 1 0.000000000 4430 Q R 224 + 3896 [xfs_io]
253,0 18 2 0.000018359 4430 X R 224 / 256 [xfs_io]
253,0 18 3 0.000028898 4430 X R 256 / 512 [xfs_io]
253,0 18 4 0.000033535 4430 X R 512 / 768 [xfs_io]
253,0 18 5 0.000065684 4430 X R 768 / 1024 [xfs_io]
253,0 18 6 0.000091695 4430 X R 1024 / 1280 [xfs_io]
253,0 18 7 0.000098494 4430 X R 1280 / 1536 [xfs_io]
253,0 18 8 0.000114069 4430 X R 1536 / 1792 [xfs_io]
253,0 18 9 0.000129483 4430 X R 1792 / 2048 [xfs_io]
253,0 18 10 0.000136759 4430 X R 2048 / 2304 [xfs_io]
253,0 18 11 0.000152412 4430 X R 2304 / 2560 [xfs_io]
253,0 18 12 0.000160758 4430 X R 2560 / 2816 [xfs_io]
253,0 18 13 0.000183385 4430 X R 2816 / 3072 [xfs_io]
253,0 18 14 0.000190797 4430 X R 3072 / 3328 [xfs_io]
253,0 18 15 0.000197667 4430 X R 3328 / 3584 [xfs_io]
253,0 18 16 0.000218751 4430 X R 3584 / 3840 [xfs_io]
253,0 18 17 0.000226005 4430 X R 3840 / 4096 [xfs_io]
253,0 18 18 0.000250404 4430 Q R 4120 + 176 [xfs_io]
253,0 18 19 0.000847708 0 C R 4096 + 24 [0]
253,0 18 20 0.000855783 0 C R 4120 + 176 [0]
Fixes: effd58c95f ("dm: always call blk_queue_split() in dm_process_bio()")
Cc: stable@vger.kernel.org
Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Tested-by: Barry Marson <bmarson@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
zero_page_range() dax operation is mandatory for dax devices. Right now
that check happens in dax_zero_page_range() function. Dan thinks that's
too late and its better to do the check earlier in alloc_dax().
I also modified alloc_dax() to return pointer with error code in it in
case of failure. Right now it returns NULL and caller assumes failure
happened due to -ENOMEM. But with this ->zero_page_range() check, I
need to return -EINVAL instead.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch adds support for dax zero_page_range operation to dm targets.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20200228163456.1587-5-vgoyal@redhat.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Current make_request based drivers use either blk_alloc_queue_node or
blk_alloc_queue to allocate a queue, and then set up the make_request_fn
function pointer and a few parameters using the blk_queue_make_request
helper. Simplify this by passing the make_request pointer to
blk_alloc_queue, and while at it merge the _node variant into the main
helper by always passing a node_id, and remove the superfluous gfp_mask
parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These macros are just used by a few files. Move them out of genhd.h,
which is included everywhere into a new standalone header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We neither assign congested_fn for requested-based blk-mq device nor
implement it correctly. So fix both.
Also, remove incorrect comment from dm_init_normal_md_queue and rename
it to dm_init_congested_fn.
Fixes: 4aa9c692e0 ("bdi: separate out congested state into a separate struct")
Cc: stable@vger.kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The function dm_suspended returns true if the target is suspended.
However, when the target is being suspended during unload, it returns
false.
An example where this is a problem: the test "!dm_suspended(wc->ti)" in
writecache_writeback is not sufficient, because dm_suspended returns
zero while writecache_suspend is in progress. As is, without an
enhanced dm_suspended, simply switching from flush_workqueue to
drain_workqueue still emits warnings:
workqueue writecache-writeback: drain_workqueue() isn't complete after 10 tries
workqueue writecache-writeback: drain_workqueue() isn't complete after 100 tries
workqueue writecache-writeback: drain_workqueue() isn't complete after 200 tries
workqueue writecache-writeback: drain_workqueue() isn't complete after 300 tries
workqueue writecache-writeback: drain_workqueue() isn't complete after 400 tries
writecache_suspend calls flush_workqueue(wc->writeback_wq) - this function
flushes the current work. However, the workqueue may re-queue itself and
flush_workqueue doesn't wait for re-queued works to finish. Because of
this - the function writecache_writeback continues execution after the
device was suspended and then concurrently with writecache_dtr, causing
a crash in writecache_writeback.
We must use drain_workqueue - that waits until the work and all re-queued
works finish.
As a prereq for switching to drain_workqueue, this commit fixes
dm_suspended to return true after the presuspend hook and before the
postsuspend hook - just like during a normal suspend. It allows
simplifying the dm-integrity and dm-writecache targets so that they
don't have to maintain suspended flags on their own.
With this change use of drain_workqueue() can be used effectively. This
change was tested with the lvm2 testsuite and cryptsetup testsuite and
the are no regressions.
Fixes: 48debafe4f ("dm: add writecache target")
Cc: stable@vger.kernel.org # 4.18+
Reported-by: Corey Marthaler <cmarthal@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Move blk_queue_make_request() to dm.c:alloc_dev() so that
q->make_request_fn is never NULL during the lifetime of a DM device
(even one that is created without a DM table).
Otherwise generic_make_request() will crash simply by doing:
dmsetup create -n test
mount /dev/dm-N /mnt
While at it, move ->congested_data initialization out of
dm.c:alloc_dev() and into the bio-based specific init method.
Reported-by: Stefan Bader <stefan.bader@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1860231
Fixes: ff36ab3458 ("dm: remove request-based logic from make_request_fn wrapper")
Depends-on: c12c9a3c38 ("dm: various cleanups to md->queue initialization code")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Avoid the need to allocate a potentially large array of struct blk_zone
in the block layer by switching the ->report_zones method interface to
a callback model. Now the caller simply supplies a callback that is
executed on each reported zone, and private data for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
No known partitioning tool supports zoned block devices, especially the
host managed flavor with strong sequential write constraints.
Furthermore, there are also no known user nor use cases for partitioned
zoned block devices.
This patch removes partition device creation for zoned block devices,
which allows simplifying the processing of zone commands for zoned
block devices. A warning is added if a partition table is found on the
device.
For report zones operations no zone sector information remapping is
necessary anymore, simplifying the code. Of note is that remapping of
zone reports for DM targets is still necessary as done by
dm_remap_zone_report().
Similarly, remaping of a zone reset bio is not necessary anymore.
Testing for the applicability of the zone reset all request also becomes
simpler and only needs to check that the number of sectors of the
requested zone range is equal to the disk capacity.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
All kernel users of blkdev_report_zones() as well as applications use
through ioctl(BLKZONEREPORT) expect to potentially get less zone
descriptors than requested. As such, the use of the internal report
zones command execution loop implemented by blk_report_zones() is
not necessary and can even be harmful to performance by causing the
execution of inefficient small zones report command to service the
reminder of a requested zone array.
This patch removes blk_report_zones(), simplifying the code. Also
remove a now incorrect comment in dm_blk_report_zones().
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Javier Gonzalez <javier@javigon.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Implement REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH
support to allow explicit control of zone states.
Contains contributions from Matias Bjorling, Hans Holmberg and
Damien Le Moal.
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, if we pass too high sector number to dm_table_find_target, it
returns zeroed dm_target structure and callers test if the structure is
zeroed with the macro dm_target_is_valid.
However, returning NULL is common practice to indicate errors.
This patch refactors the dm code, so that dm_table_find_target returns
NULL and its callers test the returned value for NULL. The macro
dm_target_is_valid is deleted. In alloc_targets, we no longer allocate an
extra zeroed target.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
persistent memory device that allows a guest VM to use DAX mechanisms to
access a host-file with host-page-cache. It arranges for MAP_SYNC to
be disabled and instead triggers a host fsync() when a 'write-cache
flush' command is sent to the virtual disk device.
- Miscellaneous small fixups.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdMHwpAAoJEB7SkWpmfYgCUYoP/3vcgYBAaXNksyALF0iowPoP
z4J0KoaOA1CzRFEQtCWUQa84CWj+XoSewwSeyrIkqKQvx/gghXblK+GVjVzBn0BD
hmmiKr8af4DdxfzYdEXJp65cCpIiVMaJiGr20Aj9ObwvWJb4QZbz9q7hnPt6KgiI
jVND3BpP3OERb4ZFcibdmJT5foKooMcXVG6+luVe+hc1+ZZQxJBsBaqie4brQIFq
j59NX3HfHH2fr1vVwnVH0CO4tgbgYg9wZ2EivGu6wBWvORjrr7KiSSbOYP68EBtd
lUoNps+vQtGnfXGwNzAjp1wuknrQYYh4/KMKjep7hiZD39rgyvBpbHbyynKzQCWV
REe8cXr/nwphsENvBAUBiqY999EWVIxdT2iaVaSA6K/31JQAC5AFyxVK/P2Ke1SK
rvePZ++iLQ1o4phTxQPNlVUqF9jOrFVVICGwMDqaqSkOsD9YKQdFClfOF/1ntlDz
V0bs+Y0Pe8AJCd9ESep4X+vHAWRRIb4EQIuwLaX8RJoY+r1fGye9RPthpYYzvXKp
DI2iJztFO3anzj2i9htNPUFIaiUmIhzEvG32O2If2yc5FL02hMpHPoFx6vHhe6s3
f8OJ+olsJK+/IIrV8+DHqYvhzylOYIhmRTvIxIxaNDPHkhR1i2RDQ6KKK1YZmsr8
MjAZ+Ym0GadDivs+wcM6
=uAMG
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"Primarily just the virtio_pmem driver:
- virtio_pmem
The new virtio_pmem facility introduces a paravirtualized
persistent memory device that allows a guest VM to use DAX
mechanisms to access a host-file with host-page-cache. It arranges
for MAP_SYNC to be disabled and instead triggers a host fsync()
when a 'write-cache flush' command is sent to the virtual disk
device.
- Miscellaneous small fixups"
* tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
virtio_pmem: fix sparse warning
xfs: disable map_sync for async flush
ext4: disable map_sync for async flush
dax: check synchronous mapping is supported
dm: enable synchronous dax
libnvdimm: add dax_dev sync flag
virtio-pmem: Add virtio pmem driver
libnvdimm: nd_region flush callback support
libnvdimm, namespace: Drop uuid_t implementation detail
Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In
preparation of using vmalloc() for large report buffer and zone array
allocations used by this function, remove its "gfp_t gfp_mask" argument
and rely on the caller context to use memalloc_noio_save/restore() where
necessary (block layer zone revalidation and dm-zoned I/O error path).
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch sets dax device 'DAXDEV_SYNC' flag if all the target
devices of device mapper support synchrononous DAX. If device
mapper consists of both synchronous and asynchronous dax devices,
we don't set 'DAXDEV_SYNC' flag.
'dm_table_supports_dax' is refactored to pass 'iterate_devices_fn'
as argument so that the callers can pass the appropriate functions.
Suggested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Commit 61697a6abd ("dm: eliminate 'split_discard_bios' flag from DM
target interface") incorrectly removed code from
__send_changing_extent_only() that is required to impose a per-target IO
boundary on IO that exceeds max_io_len_target_boundary(). Otherwise
"special" IO (e.g. DISCARD, WRITE SAME, WRITE ZEROES) can write beyond
where allowed.
Fix this by restoring the max_io_len_target_boundary() limit in
__send_changing_extent_only()
Fixes: 61697a6abd ("dm: eliminate 'split_discard_bios' flag from DM target interface")
Cc: stable@vger.kernel.org # 5.1+
Signed-off-by: Michael Lass <bevan@bi-co.net>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pankaj reports that starting with commit ad428cdb52 "dax: Check the
end of the block-device capacity with dax_direct_access()" device-mapper
no longer allows dax operation. This results from the stricter checks in
__bdev_dax_supported() that validate that the start and end of a
block-device map to the same 'pagemap' instance.
Teach the dax-core and device-mapper to validate the 'pagemap' on a
per-target basis. This is accomplished by refactoring the
bdev_dax_supported() internals into generic_fsdax_supported() which
takes a sector range to validate. Consequently generic_fsdax_supported()
is suitable to be used in a device-mapper ->iterate_devices() callback.
A new ->dax_supported() operation is added to allow composite devices to
split and route upper-level bdev_dax_supported() requests.
Fixes: ad428cdb52 ("dax: Check the end of the block-device...")
Cc: <stable@vger.kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Tested-by: Pankaj Gupta <pagupta@redhat.com>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
md->dax_dev defaults to NULL and there is no need to initialize it
if CONFIG_DAX_DRIVER is disabled.
Signed-off-by: Peng Wang <rocking@whu.edu.cn>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Storage devices which report supporting discard commands like
WRITE_SAME_16 with unmap, but reject discard commands sent to the
storage device. This is a clear storage firmware bug but it doesn't
change the fact that should a program cause discards to be sent to a
multipath device layered on this buggy storage, all paths can end up
failed at the same time from the discards, causing possible I/O loss.
The first discard to a path will fail with Illegal Request, Invalid
field in cdb, e.g.:
kernel: sd 8:0:8:19: [sdfn] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
kernel: sd 8:0:8:19: [sdfn] tag#0 Sense Key : Illegal Request [current]
kernel: sd 8:0:8:19: [sdfn] tag#0 Add. Sense: Invalid field in cdb
kernel: sd 8:0:8:19: [sdfn] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 a0 08 00 00 00 80 00 00 00
kernel: blk_update_request: critical target error, dev sdfn, sector 10487808
The SCSI layer converts this to the BLK_STS_TARGET error number, the sd
device disables its support for discard on this path, and because of the
BLK_STS_TARGET error multipath fails the discard without failing any
path or retrying down a different path. But subsequent discards can
cause path failures. Any discards sent to the path which already failed
a discard ends up failing with EIO from blk_cloned_rq_check_limits with
an "over max size limit" error since the discard limit was set to 0 by
the sd driver for the path. As the error is EIO, this now fails the
path and multipath tries to send the discard down the next path. This
cycle continues as discards are sent until all paths fail.
Fix this by training DM core to disable DISCARD if the underlying
storage already did so.
Also, fix branching in dm_done() and clone_endio() to reflect the
mutually exclussive nature of the IO operations in question.
Cc: stable@vger.kernel.org
Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The limit was already incorporated to dm-crypt with commit 4e870e948f
("dm crypt: fix error with too large bios"), so we don't need to apply
it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is
wrong anyway because the variable ti->max_io_len it is supposed to be in
the units of 512-byte sectors not in bytes.
Reduction of the limit to 1048576 sectors could even cause data
corruption in rare cases - suppose that we have a dm-striped device with
stripe size 768MiB. The target will call dm_set_target_max_io_len with
the value 1572864. The buggy code would reduce it to 1048576. Now, the
dm-core will errorneously split the bios on 1048576-sector boundary
insetad of 1572864-sector boundary and pass these stripe-crossing bios
to the striped target.
Cc: stable@vger.kernel.org # v4.16+
Fixes: 8f50e35815 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Do not just call blk_queue_split() if the bio is_abnormal_io().
Fixes: 568c73a355 ("dm: update dm_process_bio() to split bio if in ->make_request_fn()")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There is no need to have DM core split discards on behalf of a DM target
now that blk_queue_split() handles splitting discards based on the
queue_limits. A DM target just needs to set max_discard_sectors,
discard_granularity, etc, in queue_limits.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Must call blk_queue_split() otherwise queue_limits for abnormal requests
(e.g. discard, writesame, etc) won't be imposed.
In addition, add dm_queue_split() to simplify DM specific splitting that
is needed for targets that impose ti->max_io_len.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
bio_trim() has an early return, which makes it _not_ idempotent, if the
offset is 0 and the bio's bi_size already matches the requested size.
Prior to DM, all users of bio_trim() were fine with this. But DM has
exposed the fact that bio_trim()'s early return is incompatible with a
cloned bio whose integrity payload must be trimmed via
bio_integrity_trim().
Fix this by reverting DM back to doing the equivalent of bio_trim() but
in an idempotent manner (so bio_integrity_trim is always performed).
Follow-on work is needed to assess what benefit bio_trim()'s early
return is providing to its existing callers.
Reported-by: Milan Broz <gmazyland@gmail.com>
Fixes: 57c36519e4 ("dm: fix clone_bio() to trigger blk_recount_segments()")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Block core changes to switch bio-based IO accounting to be percpu had a
side-effect of altering DM core to now rely on calling waitqueue_active
(in both bio-based and request-based) to check if another task is in
dm_wait_for_completion().
A memory barrier is needed before calling waitqueue_active(). DM core
doesn't piggyback on a preceding memory barrier so it must explicitly
use its own.
For more details on why using waitqueue_active() without a preceding
barrier is unsafe, please see the comment before the waitqueue_active()
definition in include/linux/wait.h.
Add the missing memory barrier by switching to using wq_has_sleeper().
Fixes: 6f75723190 ("dm: remove the pending IO accounting")
Fixes: c4576aed8d ("dm: fix request-based dm's use of dm_wait_for_completion")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The risk of redundant IO accounting was not taken into consideration
when commit 18a25da843 ("dm: ensure bio submission follows a
depth-first tree walk") introduced IO splitting in terms of recursion
via generic_make_request().
Fix this by subtracting the split bio's payload from the IO stats that
were already accounted for by start_io_acct() upon dm_make_request()
entry. This repeat oscillation of the IO accounting, up then down,
isn't ideal but refactoring DM core's IO splitting to pre-split bios
_before_ they are accounted turned out to be an excessive amount of
change that will need a full development cycle to refine and verify.
Before this fix:
/dev/mapper/stripe_dev is a 4-way stripe using a 32k chunksize, so
bios are split on 32k boundaries.
# fio --name=16M --filename=/dev/mapper/stripe_dev --rw=write --bs=64k --size=16M \
--iodepth=1 --ioengine=libaio --direct=1 --refill_buffers
with debugging added:
[103898.310264] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=0 len=128
[103898.318704] device-mapper: core: __split_and_process_bio: recursing for following split bio:
[103898.329136] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=64 len=64
...
16M written yet 136M (278528 * 512b) accounted:
# cat /sys/block/dm-2/stat | awk '{ print $7 }'
278528
After this fix:
16M written and 16M (32768 * 512b) accounted:
# cat /sys/block/dm-2/stat | awk '{ print $7 }'
32768
Fixes: 18a25da843 ("dm: ensure bio submission follows a depth-first tree walk")
Cc: stable@vger.kernel.org # 4.16+
Reported-by: Bryan Gurney <bgurney@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
DM's clone_bio() now benefits from using bio_trim() by fixing the fact
that clone_bio() wasn't clearing BIO_SEG_VALID like bio_trim() does;
which triggers blk_recount_segments() via bio_phys_segments().
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
- Fix DM to allow reads that exceed readahead limits by setting io_pages
in the backing_dev_info.
- A couple code cleanups in request-based DM.
- Fix various DM targets to check for device sector overflow if
CONFIG_LBDAF is not set.
- Use u64 instead of sector_t to store iv_offset in DM crypt; sector_t
isn't large enough on 32bit when CONFIG_LBDAF is not set.
- Performance fixes to DM's kcopyd and the snapshot target focused on
limiting memory use and workqueue stalls.
- Fix typos in the integrity and writecache targets.
- Log which algorithm is used for dm-crypt's encryption and
dm-integrity's hashing.
- Fix false -EBUSY errors in DM raid target's handling of check/repair
messages.
- Fix DM flakey target's corrupt_bio_byte feature to reliably corrupt
the Nth byte in a bio's payload.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJcHu6MAAoJEMUj8QotnQNar9gIAKyJ5hH0ik3mlLhAyPcMkRU+
MPvjTAC8V5V33VY8YqKD/TcDHY8yaGTPKSK0SAnB8Jv2tLz8mp0H18syJbMvNcFc
MS/xWYLAZiXf5xJ4eJpo8ayDvU6DAkKtSbG7VHJ8gnx/WV8bW1b90OCIq8GN+jHo
DPp6RDzy3LvonoZ9OPognQKUDXK1crdPDiv1Wzw9Zjd4JlruIaGgJOc1RLB7YdVK
u5c7UD7cT+MCROEzJrctAZa5oRVhuemd83HPvwE2akRFvCSNtc7aIVtNfJ79nqL7
19oUDfsuzf2zkiJLzO0oU4OrJ+lUGDJf/sPXmU1qlPyKdUqYKFxT+KHvBnhAQXI=
=pCs8
-----END PGP SIGNATURE-----
Merge tag 'for-4.21/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Eliminate a couple indirect calls from bio-based DM core.
- Fix DM to allow reads that exceed readahead limits by setting
io_pages in the backing_dev_info.
- A couple code cleanups in request-based DM.
- Fix various DM targets to check for device sector overflow if
CONFIG_LBDAF is not set.
- Use u64 instead of sector_t to store iv_offset in DM crypt; sector_t
isn't large enough on 32bit when CONFIG_LBDAF is not set.
- Performance fixes to DM's kcopyd and the snapshot target focused on
limiting memory use and workqueue stalls.
- Fix typos in the integrity and writecache targets.
- Log which algorithm is used for dm-crypt's encryption and
dm-integrity's hashing.
- Fix false -EBUSY errors in DM raid target's handling of check/repair
messages.
- Fix DM flakey target's corrupt_bio_byte feature to reliably corrupt
the Nth byte in a bio's payload.
* tag 'for-4.21/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: do not allow readahead to limit IO size
dm raid: fix false -EBUSY when handling check/repair message
dm rq: cleanup leftover code from recently removed q->mq_ops branching
dm verity: log the hash algorithm implementation
dm crypt: log the encryption algorithm implementation
dm integrity: fix spelling mistake in workqueue name
dm flakey: Properly corrupt multi-page bios.
dm: Check for device sector overflow if CONFIG_LBDAF is not set
dm crypt: use u64 instead of sector_t to store iv_offset
dm kcopyd: Fix bug causing workqueue stalls
dm snapshot: Fix excessive memory usage and workqueue stalls
dm bufio: update comment in dm-bufio.c
dm writecache: fix typo in error msg for creating writecache_flush_thread
dm: remove indirect calls from __send_changing_extent_only()
dm mpath: only flush workqueue when needed
dm rq: remove unused arguments from rq_completed()
dm: avoid indirect call in __dm_make_request
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlwb7R8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjiID/97oDjMhNT7rwpuMbHw855h62j1hEN/m+N3
FI0uxivYoYZLD+eJRnMcBwHlKjrCX8iJQAcv9ffI3ThtFW7dnZT3atUacaZVR/Dt
IrxdymdBP3qsmuaId5NYBug7rJ+AiqFJKjEvCcSPu5X397J4I3SEbzhfvYLJ/aZX
16o0HJlVVIrcbmq1IP4HwiIIOaKXvPaw04L4z4fpeynRSWG7EAi8NLSnhlR4Rxbb
BTiMkCTsjRCFdyO6da4fvNQKWmPGPa3bJkYy3qR99cvJCeIbQjRyCloQlWNJRRgi
3eJpCHVxqFmN0/+DNTJVQEEr4H8o0AVucrLVct1Jc4pessenkpoUniP8vELqwlng
Z2VHLkhTfCEmvFlk82grrYdNvGATRsrbswt/PlP4T7rBfr1IpDk8kXDWF59EL2dy
ly35Sk3wJGHBl8qa+vEPXOAnaWdqJXuVGpwB4ifOIatOls8mOxwfZjiRc7x05/fC
1O4rR2IfLwRqwoYHs0AJ+h6ohOSn1mkGezl2Tch1VSFcJUOHmuYvraTaUi6hblpA
SslaAoEhO39hRBL0HsvsMeqVWM9uzqvFkLDCfNPdiA81H1258CIbo4vF8z6czCIS
eeXnTJxVhPVbZgb3a1a93SPwM6KIDZFoIijyd+NqjpU94thlnhYD0QEcKJIKH7os
2p4aHs6ktw==
=TRdW
-----END PGP SIGNATURE-----
Merge tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
"This is the main pull request for block/storage for 4.21.
Larger than usual, it was a busy round with lots of goodies queued up.
Most notable is the removal of the old IO stack, which has been a long
time coming. No new features for a while, everything coming in this
week has all been fixes for things that were previously merged.
This contains:
- Use atomic counters instead of semaphores for mtip32xx (Arnd)
- Cleanup of the mtip32xx request setup (Christoph)
- Fix for circular locking dependency in loop (Jan, Tetsuo)
- bcache (Coly, Guoju, Shenghui)
* Optimizations for writeback caching
* Various fixes and improvements
- nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith)
* host and target support for NVMe over TCP
* Error log page support
* Support for separate read/write/poll queues
* Much improved polling
* discard OOM fallback
* Tracepoint improvements
- lightnvm (Hans, Hua, Igor, Matias, Javier)
* Igor added packed metadata to pblk. Now drives without metadata
per LBA can be used as well.
* Fix from Geert on uninitialized value on chunk metadata reads.
* Fixes from Hans and Javier to pblk recovery and write path.
* Fix from Hua Su to fix a race condition in the pblk recovery
code.
* Scan optimization added to pblk recovery from Zhoujie.
* Small geometry cleanup from me.
- Conversion of the last few drivers that used the legacy path to
blk-mq (me)
- Removal of legacy IO path in SCSI (me, Christoph)
- Removal of legacy IO stack and schedulers (me)
- Support for much better polling, now without interrupts at all.
blk-mq adds support for multiple queue maps, which enables us to
have a map per type. This in turn enables nvme to have separate
completion queues for polling, which can then be interrupt-less.
Also means we're ready for async polled IO, which is hopefully
coming in the next release.
- Killing of (now) unused block exports (Christoph)
- Unification of the blk-rq-qos and blk-wbt wait handling (Josef)
- Support for zoned testing with null_blk (Masato)
- sx8 conversion to per-host tag sets (Christoph)
- IO priority improvements (Damien)
- mq-deadline zoned fix (Damien)
- Ref count blkcg series (Dennis)
- Lots of blk-mq improvements and speedups (me)
- sbitmap scalability improvements (me)
- Make core inflight IO accounting per-cpu (Mikulas)
- Export timeout setting in sysfs (Weiping)
- Cleanup the direct issue path (Jianchao)
- Export blk-wbt internals in block debugfs for easier debugging
(Ming)
- Lots of other fixes and improvements"
* tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits)
kyber: use sbitmap add_wait_queue/list_del wait helpers
sbitmap: add helpers for add/del wait queue handling
block: save irq state in blkg_lookup_create()
dm: don't reuse bio for flushes
nvme-pci: trace SQ status on completions
nvme-rdma: implement polling queue map
nvme-fabrics: allow user to pass in nr_poll_queues
nvme-fabrics: allow nvmf_connect_io_queue to poll
nvme-core: optionally poll sync commands
block: make request_to_qc_t public
nvme-tcp: fix spelling mistake "attepmpt" -> "attempt"
nvme-tcp: fix endianess annotations
nvmet-tcp: fix endianess annotations
nvme-pci: refactor nvme_poll_irqdisable to make sparse happy
nvme-pci: only set nr_maps to 2 if poll queues are supported
nvmet: use a macro for default error location
nvmet: fix comparison of a u16 with -1
blk-mq: enable IO poll if .nr_queues of type poll > 0
blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
blk-mq: skip zero-queue maps in blk_mq_map_swqueue
...
DM currently has a statically allocated bio that it uses to issue empty
flushes. It doesn't submit this bio, it just uses it for maintaining
state while setting up clones. Multiple users can access this bio at the
same time. This wasn't previously an issue, even if it was a bit iffy,
but with the blkg associations it can become one.
We setup the blkg association, then clone bio's and submit, then remove
the blkg assocation again. But since we can have multiple tasks doing
this at the same time, against multiple blkg's, then we can either lose
references to a blkg, or put it twice. The latter causes complaints on
the percpu ref being <= 0 when released, and can cause use-after-free as
well. Ming reports that xfstest generic/475 triggers this:
------------[ cut here ]------------
percpu ref (blkg_release) <= 0 (0) after switching to atomic
WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0
Switch to just using an on-stack bio for this, and get rid of the
embedded bio.
Fixes: 5cdf2e3fea ("blkcg: associate blkg when associating a device")
Reported-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Indirect calls are inefficient because of retpolines that are used for
spectre workaround. This patch replaces an indirect call with a condition
(that can be predicted by the branch predictor).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There's a single user of this function, dm, and dm just wants
to check if IO is inflight, not that it's just allocated.
This fixes a hang with srp/002 in blktests with dm, where it tries
to suspend but waits for inflight IO to finish first. As it checks
for just allocated requests, this fails.
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The md->wait waitqueue is used by both bio-based and request-based DM.
Commit dbd3bbd291 ("dm rq: leverage blk_mq_queue_busy() to check for
outstanding IO") lost sight of the requirement that
dm_wait_for_completion() must work with all types of DM devices.
Fix md_in_flight() to call the blk-mq or bio-based method accordingly.
Fixes: dbd3bbd291 ("dm rq: leverage blk_mq_queue_busy() to check for outstanding IO")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After switching to percpu inflight counters, the inflight check
is totally buggy. It's perfectly valid for some counters to be
non-zero while having a total inflight IO count of 0, that's how
these kinds of counters work (inc on one CPU, dec on another).
Fix the md_in_flight() check to sum all counters before returning
a false positive, potentially.
While at it, remove the inflight read for IO completion. We don't
need it, just wake anyone that's waiting for the IO count to drop
to zero. The caller needs to re-check that value anyway when woken,
which it does.
Fixes: 6f75723190 ("dm: remove the pending IO accounting")
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
generic_start_io_acct and generic_end_io_acct already update the variable
in_flight using atomic operations, so we don't have to overwrite them
again.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The next patch changes the macro bio_set_dev() to associate a bio with a
blkg based on the device set. However, dm creates a static bio to be
used as the basis for cloning empty flush bios on creation. The
bio_set_dev() call in alloc_dev() will cause problems with the next
patch adding association to bio_set_dev() because the call is before the
bdev is associated with a gendisk (bd_disk is %NULL). To get around
this, set the device on the static bio every time and use that to clone
to the other bios.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Otherwise the incoming bios, of various types, won't be shaped based on
the DM device's advertised limits.
Depends-on: af67c31fba ("blk: remove bio_set arg from blk_queue_split()")
Fixes: 744889b7cb ("block: don't deal with discard limit in blkdev_issue_discard()")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
With the legacy request path gone there is no real need to override the
queue_lock.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(.request_fn) from request-based DM. Jens has already started
preparing for complete removal of the legacy IO path in 4.21 but this
earlier removal of support from DM has been coordinated with Jens (as
evidenced by the commit being attributed to him). Making
request-based DM exclussively blk-mq only cleans up that portion of DM
core quite nicely.
- Convert the thinp and zoned targets over to using refcount_t where
applicable.
- A couple fixes to the DM zoned target for refcounting and other races
buried in the implementation of metadata block creation and use.
- Small cleanups to remove redundant unlikely() around a couple
WARN_ON_ONCE().
- Simplify how dm-ioctl copies from userspace, eliminating some
potential for a malicious user trying to change the executed ioctl
after its processing has begun.
- Tweaked DM crypt target to use the DM device name when naming the
various workqueues created for a particular DM crypt device (makes the
N workqueues for a DM crypt device more easily understood and enhances
user's accounting capabilities at a glance via "ps")
- Small fixup to remove dead branch in DM writecache's memory_entry().
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJb0zCmAAoJEMUj8QotnQNaUKwIAKFC3cuTtQyh3LomSbT4uAr5
6apBSVcxVhqU+isriW3HmBPkO4HGyqMWjX5oQGHrOj0YK0i1H65Nq3qH9ATaiSHn
awdo8A4YmClF5Mojc51UebXIH0IfnGSOKH/FHNhQzT3jAdn+vYinMSZ28JwFPgKW
DsVOSM1dlJZBWRXhQNpyCjVl9Xb3rRUOnkfG0endyMfOsnoxKurhwSkXoStzCdQn
O5ubt1XT3wMKoI1k9QWjfrBU1NtZZYD+kQ6EfkYXfL9RNhhZjwzO/eNtqT3jnKsq
qbcd8/0JIUttPf7+F0URG9mbMbebfJGqNAaJWcnlRbCHmgUBBGVWsnl8MTWOLkw=
=MdDN
-----END PGP SIGNATURE-----
Merge tag 'for-4.20/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- The biggest change this cycle is to remove support for the legacy IO
path (.request_fn) from request-based DM.
Jens has already started preparing for complete removal of the legacy
IO path in 4.21 but this earlier removal of support from DM has been
coordinated with Jens (as evidenced by the commit being attributed to
him).
Making request-based DM exclussively blk-mq only cleans up that
portion of DM core quite nicely.
- Convert the thinp and zoned targets over to using refcount_t where
applicable.
- A couple fixes to the DM zoned target for refcounting and other races
buried in the implementation of metadata block creation and use.
- Small cleanups to remove redundant unlikely() around a couple
WARN_ON_ONCE().
- Simplify how dm-ioctl copies from userspace, eliminating some
potential for a malicious user trying to change the executed ioctl
after its processing has begun.
- Tweaked DM crypt target to use the DM device name when naming the
various workqueues created for a particular DM crypt device (makes
the N workqueues for a DM crypt device more easily understood and
enhances user's accounting capabilities at a glance via "ps")
- Small fixup to remove dead branch in DM writecache's memory_entry().
* tag 'for-4.20/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm writecache: remove disabled code in memory_entry()
dm zoned: fix various dmz_get_mblock() issues
dm zoned: fix metadata block ref counting
dm raid: avoid bitmap with raid4/5/6 journal device
dm crypt: make workqueue names device-specific
dm: add dm_table_device_name()
dm ioctl: harden copy_params()'s copy_from_user() from malicious users
dm: remove unnecessary unlikely() around WARN_ON_ONCE()
dm zoned: target: use refcount_t for dm zoned reference counters
dm thin: use refcount_t for thin_c reference counting
dm table: require that request-based DM be layered on blk-mq devices
dm: rename DM_TYPE_MQ_REQUEST_BASED to DM_TYPE_REQUEST_BASED
dm: remove legacy request-based IO path
Dispatching a report zones command through the request queue is a major
pain due to the command reply payload rewriting necessary. Given that
blkdev_report_zones() is executing everything synchronously, implement
report zones as a block device file operation instead, allowing major
simplification of the code in many places.
sd, null-blk, dm-linear and dm-flakey being the only block device
drivers supporting exposing zoned block devices, these drivers are
modified to provide the device side implementation of the
report_zones() block device file operation.
For device mappers, a new report_zones() target type operation is
defined so that the upper block layer calls blkdev_report_zones() can
be propagated down to the underlying devices of the dm targets.
Implementation for this new operation is added to the dm-linear and
dm-flakey targets.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[Damien]
* Changed method block_device argument to gendisk
* Various bug fixes and improvements
* Added support for null_blk, dm-linear and dm-flakey.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>