linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-22 04:02:20 +00:00

Author	SHA1	Message	Date
Christoph Hellwig	bf86bcdb40	blk-lib: check for kill signal in ioctl BLKZEROOUT Zeroout can access a significant capacity and take longer than the user expected. A user may change their mind about wanting to run that command and attempt to kill the process and do something else with their device. But since the task is uninterruptable, they have to wait for it to finish, which could be many hours. Add a new BLKDEV_ZERO_KILLABLE flag for blkdev_issue_zeroout that checks for a fatal signal at each iteration so the user doesn't have to wait for their regretted operation to complete naturally. Heavily based on an earlier patch from Keith Busch. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240701165219.1571322-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-05 00:53:15 -06:00
Christoph Hellwig	39722a2f2b	block: limit the Write Zeroes to manually writing zeroes fallback Only fall back from hardware Write Zeroes failures when blkdev_issue_write_zeroes returns -EOPNOTSUPP; Note that blkdev_issue_write_zeroes turns any failure into -EOPNOTSUPP when the write zeroes queue limit has been cleared to 0, so this still catches all I/O errors where the driver detected missing support for the hardware acceleration. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240701165219.1571322-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-05 00:53:15 -06:00
Christoph Hellwig	99800ced26	block: refacto blkdev_issue_zeroout Split out two well-defined helpers for hardware supported Write Zeroes and manually writing zeroes using the Write command. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240701165219.1571322-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-05 00:53:15 -06:00
Christoph Hellwig	f6eacb2654	block: move read-only and supported checks into (__)blkdev_issue_zeroout Move these checks out of the lower level helpers and into the higher level ones to prepare for refactoring. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240701165219.1571322-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-05 00:53:15 -06:00
Christoph Hellwig	ff760a8f0d	block: remove the LBA alignment check in __blkdev_issue_zeroout __blkdev_issue_zeroout is a purely kernel internal API and thus can rely on the block layer sector alignment checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240701165219.1571322-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-05 00:53:15 -06:00
Christoph Hellwig	73a768d5f9	block: factor out a blk_write_zeroes_limit helper Contrary to the comment in __blkdev_issue_write_zeroes, nothing here checks for a potential bi_size overflow. Add a helper mirroring the secure erase code for the check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240701165219.1571322-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-05 00:53:15 -06:00
Damien Le Moal	2f20872ed4	block: Remove blk_alloc_zone_bitmap() Remove the helper function blk_alloc_zone_bitmap() and replace its single call site with a call to bitmap_alloc(). To be consistent with this change, use bitmap_free() to free a disk convnetional zone bitmap. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240704052816.623865-6-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-05 00:42:04 -06:00
Damien Le Moal	f2a7bea237	block: Remove REQ_OP_ZONE_RESET_ALL emulation Now that device mapper can handle resetting all zones of a mapped zoned device using REQ_OP_ZONE_RESET_ALL, all zoned block device drivers support this operation. With this, the request queue feature BLK_FEAT_ZONE_RESETALL is not necessary and the emulation code in blk-zone.c can be removed. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240704052816.623865-5-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-05 00:42:04 -06:00
Anuj Gupta	ba94223805	block: reuse original bio_vec array for integrity during clone Modify bio_integrity_clone to reuse the original bvec array instead of allocating and copying it, similar to how bio data path is cloned. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240702100753.2168-1-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-04 02:07:58 -06:00
Bart Van Assche	39823b47bb	block/mq-deadline: Fix the tag reservation code The current tag reservation code is based on a misunderstanding of the meaning of data->shallow_depth. Fix the tag reservation code as follows: * By default, do not reserve any tags for synchronous requests because for certain use cases reserving tags reduces performance. See also Harshit Mogalapalli, [bug-report] Performance regression with fio sequential-write on a multipath setup, 2024-03-07 (https://lore.kernel.org/linux-block/5ce2ae5d-61e2-4ede-ad55-551112602401@oracle.com/) * Reduce min_shallow_depth to one because min_shallow_depth must be less than or equal any shallow_depth value. * Scale dd->async_depth from the range [1, nr_requests] to [1, bits_per_sbitmap_word]. Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Zhiguo Niu <zhiguo.niu@unisoc.com> Fixes: `07757588e5` ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240509170149.7639-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-02 08:47:45 -06:00
Bart Van Assche	6259151c04	block: Call .limit_depth() after .hctx has been set Call .limit_depth() after data->hctx has been set such that data->hctx can be used in .limit_depth() implementations. Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Zhiguo Niu <zhiguo.niu@unisoc.com> Fixes: `07757588e5` ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240509170149.7639-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-02 08:47:45 -06:00
Christoph Hellwig	37105615f7	block: don't reduce max_sectors based on io_opt Don't reduce the max_sectors value below the normal cap when the driver advertsizes a very low io_opt. This restores the behavior we had before the recent changes to the max_sectors calculation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240701051800.1245240-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-01 06:52:42 -06:00
Christoph Hellwig	f62e8edc0a	block: remove a duplicate io_min check in blk_validate_limits If io_min is larger than the cap, it must by definition be non-zero. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240701051800.1245240-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-01 06:52:42 -06:00
Baokun Li	4e63aeb5d0	blk-wbt: don't throttle swap writes in direct reclaim Now we avoid throttling swap writes by determining whether the current process is kswapd (aka current_is_kswapd()), but swap writes can come from either kswapd or direct reclaim, so the swap writes from direct reclaim will still be throttled. When a process holds a lock to allocate a free page, and enters direct reclaim because there is no free memory, then it might trigger a hung due to the wbt throttling that causes other processes to fail to get the lock. Both kswapd and direct reclaim set the REQ_SWAP flag, so use REQ_SWAP instead of current_is_kswapd() to avoid throttling swap writes. Also renamed WBT_KSWAPD to WBT_SWAP and WBT_RWQ_KSWAPD to WBT_RWQ_SWAP. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240604030522.3686177-1-libaokun@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-07-01 06:51:53 -06:00
Christoph Hellwig	62e35f9422	block: pass a gendisk to the queue_sysfs_entry methods The kobject for the queue entries is embedded into a struct gendisk. Pass it to the sysfs methods instead of the request_queue derived from it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240627111407.476276-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-28 15:06:16 -06:00
Christoph Hellwig	319e8cfdf3	block: add helper macros to de-duplicate the queue sysfs attributes A lof the code to implement the queue sysfs attributes is repetitive. Add a few macros to generate the common cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240627111407.476276-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-28 15:06:16 -06:00
Yu Kuai	1beabab88e	blk-throttle: fix lower control under super low iops limit User will configure allowed iops limit in 1s, and calculate_io_allowed() will calculate allowed iops in the slice by: limit * HZ / throtl_slice However, if limit is quite low, the result can be 0, then allowed IO in the slice is 0, this will cause missing dispatch and control will be lower than limit. For example, set iops_limit to 5 with HD disk, and test will found that iops will be 3. This is usually not a big deal, because user will unlikely to configure such low iops limit, however, this is still a problem in the extreme scene. Fix the problem by making sure the wait time calculated by tg_within_iops_limit() should allow at least one IO to be dispatched. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240618062108.3680835-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-28 14:55:02 -06:00
Anuj Gupta	3991657ae7	block: set bip_vcnt correctly Set the bip_vcnt correctly in bio_integrity_init_user and bio_integrity_copy_user. If the bio gets split at a later point, this value is required to set the right bip_vcnt in the cloned bio. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240626100700.3629-3-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-28 14:34:54 -06:00
Ming Lei	0676c434a9	block: check bio alignment in blk_mq_submit_bio IO logical block size is one fundamental queue limit, and every IO has to be aligned with logical block size because our bio split can't deal with unaligned bio. The check has to be done with queue usage counter grabbed because device reconfiguration may change logical block size, and we can prevent the reconfiguration from happening by holding queue usage counter. logical_block_size stays in the 1st cache line of queue_limits, and this cache line is always fetched in fast path via bio_may_exceed_limits(), so IO perf won't be affected by this check. Cc: Yi Zhang <yi.zhang@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ye Bin <yebin10@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240620030631.3114026-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-28 10:31:34 -06:00
Christoph Hellwig	d19b46340b	block: remove bio_integrity_process Move the bvec interation into the generate/verify helpers to avoid a bit of argument passing churn. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240626045950.189758-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-28 10:29:42 -06:00
Christoph Hellwig	df3c485e0e	block: switch on bio operation in bio_integrity_prep Use a single switch to perform read and write specific checks and exit early for other operations instead of having two checks using different predicates. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240626045950.189758-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-28 10:29:41 -06:00
Christoph Hellwig	dac18fabba	block: remove allocation failure warnings in bio_integrity_prep Allocation failures already leave a stack trace, so don't spew another warning. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240626045950.189758-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-28 10:29:41 -06:00
Christoph Hellwig	c096df9083	block: simplify adding the payload in bio_integrity_prep bio_integrity_add_page can add physically contiguous regions of any size, so don't bother chunking up the kmalloced buffer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240626045950.189758-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-28 10:29:41 -06:00
Christoph Hellwig	c546d6f438	block: only zero non-PI metadata tuples in bio_integrity_prep The PI generation helpers already zero the app tag, so relax the zeroing to non-PI metadata. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240626045950.189758-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-28 10:29:41 -06:00
John Garry	63db4a1f79	block: Delete blk_queue_flag_test_and_set() Since commit `70200574cc` ("block: remove QUEUE_FLAG_DISCARD"), blk_queue_flag_test_and_set() has not been used, so delete it. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240627160735.842189-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-27 12:43:27 -06:00
Li Nan	e269537e49	block: clean up the check in blkdev_iomap_begin() It is odd to check the offset amidst a series of assignments. Moving this check to the beginning of the function makes the code look better. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240625115517.1472120-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-27 05:56:35 -06:00
Christoph Hellwig	e94b45d08b	block: move dma_pad_mask into queue_limits dma_pad_mask is a queue_limits by all ways of looking at it, so move it there and set it through the atomic queue limits APIs. Add a little helper that takes the alignment and pad into account to simplify the code that is touched a bit. Note that there never was any need for the > check in blk_queue_update_dma_pad, this probably was just copy and paste from dma_update_dma_alignment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240626142637.300624-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-26 09:37:35 -06:00
Christoph Hellwig	73781b3b81	block: remove disk_update_readahead Mark blk_apply_bdi_limits non-static and open code disk_update_readahead in the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240626142637.300624-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-26 09:37:35 -06:00
Christoph Hellwig	3302f6f090	block: conding style fixup for blk_queue_max_guaranteed_bio "static" never goes on a line of its own. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240626142637.300624-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-26 09:37:35 -06:00
Christoph Hellwig	fcf865e357	block: convert features and flags to __bitwise types ... and let sparse help us catch mismatches or abuses. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240626142637.300624-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-26 09:37:35 -06:00
Christoph Hellwig	ec9b1cf0b0	block: rename BLK_FEAT_MISALIGNED This is a flag for ->flags and not a feature for ->features. And fix the one place that actually incorrectly cleared it from ->features. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240626142637.300624-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-26 09:37:35 -06:00
Christoph Hellwig	78887d004f	block: correctly report cache type Check the features flag and the override flag using the blk_queue_write_cache, helper otherwise we're going to always report "write through". Fixes: `1122c0c1cc` ("block: move cache control settings out of queue->flags") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240626142637.300624-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-26 09:37:35 -06:00
John Garry	8324bb755a	block: Fix blk_validate_atomic_write_limits() build for arm32 For arm32, we get the following build warning: In file included from /tmp/next/build/include/linux/printk.h:10, from /tmp/next/build/include/linux/kernel.h:31, from /tmp/next/build/block/blk-settings.c:5: /tmp/next/build/block/blk-settings.c: In function 'blk_validate_atomic_write_limits': /tmp/next/build/include/asm-generic/div64.h:222:35: warning: comparison of distinct pointer types lacks a cast 222 \| (void)(((typeof((n)) )0) == ((uint64_t )0)); \ \| ^~ The divident for do_div() should be 64b, which it is not. Since we want to check 2x unsigned ints, just use % operator. This allows us to drop the chunk_sectors variable. Fixes: `9da3d1e912` ("block: Add core atomic write support") Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/linux-next/b765d200-4e0f-48b1-a962-7dfa1c4aef9c@kernel.dk/T/#mbf067b1edd89c7f9d7dac6e258c516199953a108 Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240621183016.3092518-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-21 12:31:55 -06:00
Damien Le Moal	b6cfe2287d	block: Define bdev_nr_zones() as an inline function There is no need for bdev_nr_zones() to be an exported function calculating the number of zones of a block device. Instead, given that all callers use this helper with a fully initialized block device that has a gendisk, we can redefine this function as an inline helper in blkdev.h. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240621031506.759397-3-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-21 08:26:35 -06:00
John Garry	caf336f81b	block: Add fops atomic write support Support atomic writes by submitting a single BIO with the REQ_ATOMIC set. It must be ensured that the atomic write adheres to its rules, like naturally aligned offset, so call blkdev_dio_invalid() -> blkdev_atomic_write_valid() [with renaming blkdev_dio_unaligned() to blkdev_dio_invalid()] for this purpose. The BIO submission path currently checks for atomic writes which are too large, so no need to check here. In blkdev_direct_IO(), if the nr_pages exceeds BIO_MAX_VECS, then we cannot produce a single BIO, so error in this case. Finally set FMODE_CAN_ATOMIC_WRITE when the bdev can support atomic writes and the associated file flag is for O_DIRECT. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20240620125359.2684798-8-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 15:19:17 -06:00
Prasad Singamsetty	9abcfbd235	block: Add atomic write support for statx Extend statx system call to return additional info for atomic write support support if the specified file is a block device. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Prasad Singamsetty <prasad.singamsetty@oracle.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20240620125359.2684798-7-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 15:19:17 -06:00
John Garry	9da3d1e912	block: Add core atomic write support Add atomic write support, as follows: - add helper functions to get request_queue atomic write limits - report request_queue atomic write support limits to sysfs and update Doc - support to safely merge atomic writes - deal with splitting atomic writes - misc helper functions - add a per-request atomic write flag New request_queue limits are added, as follows: - atomic_write_hw_max is set by the block driver and is the maximum length of an atomic write which the device may support. It is not necessarily a power-of-2. - atomic_write_max_sectors is derived from atomic_write_hw_max_sectors and max_hw_sectors. It is always a power-of-2. Atomic writes may be merged, and atomic_write_max_sectors would be the limit on a merged atomic write request size. This value is not capped at max_sectors, as the value in max_sectors can be controlled from userspace, and it would only cause trouble if userspace could limit atomic_write_unit_max_bytes and the other atomic write limits. - atomic_write_hw_unit_{min,max} are set by the block driver and are the min/max length of an atomic write unit which the device may support. They both must be a power-of-2. Typically atomic_write_hw_unit_max will hold the same value as atomic_write_hw_max. - atomic_write_unit_{min,max} are derived from atomic_write_hw_unit_{min,max}, max_hw_sectors, and block core limits. Both min and max values must be a power-of-2. - atomic_write_hw_boundary is set by the block driver. If non-zero, it indicates an LBA space boundary at which an atomic write straddles no longer is atomically executed by the disk. The value must be a power-of-2. Note that it would be acceptable to enforce a rule that atomic_write_hw_boundary_sectors is a multiple of atomic_write_hw_unit_max, but the resultant code would be more complicated. All atomic writes limits are by default set 0 to indicate no atomic write support. Even though it is assumed by Linux that a logical block can always be atomically written, we ignore this as it is not of particular interest. Stacked devices are just not supported either for now. An atomic write must always be submitted to the block driver as part of a single request. As such, only a single BIO must be submitted to the block layer for an atomic write. When a single atomic write BIO is submitted, it cannot be split. As such, atomic_write_unit_{max, min}_bytes are limited by the maximum guaranteed BIO size which will not be required to be split. This max size is calculated by request_queue max segments and the number of bvecs a BIO can fit, BIO_MAX_VECS. Currently we rely on userspace issuing a write with iovcnt=1 for pwritev2() - as such, we can rely on each segment containing PAGE_SIZE of data, apart from the first+last, which each can fit logical block size of data. The first+last will be LBS length/aligned as we rely on direct IO alignment rules also. New sysfs files are added to report the following atomic write limits: - atomic_write_unit_max_bytes - same as atomic_write_unit_max_sectors in bytes - atomic_write_unit_min_bytes - same as atomic_write_unit_min_sectors in bytes - atomic_write_boundary_bytes - same as atomic_write_hw_boundary_sectors in bytes - atomic_write_max_bytes - same as atomic_write_max_sectors in bytes Atomic writes may only be merged with other atomic writes and only under the following conditions: - total resultant request length <= atomic_write_max_bytes - the merged write does not straddle a boundary Helper function bdev_can_atomic_write() is added to indicate whether atomic writes may be issued to a bdev. If a bdev is a partition, the partition start must be aligned with both atomic_write_unit_min_sectors and atomic_write_hw_boundary_sectors. FSes will rely on the block layer to validate that an atomic write BIO submitted will be of valid size, so add blk_validate_atomic_write_op_size() for this purpose. Userspace expects an atomic write which is of invalid size to be rejected with -EINVAL, so add BLK_STS_INVAL for this. Also use BLK_STS_INVAL for when a BIO needs to be split, as this should mean an invalid size BIO. Flag REQ_ATOMIC is used for indicating an atomic write. Co-developed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20240620125359.2684798-6-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 15:19:17 -06:00
John Garry	f70167a7a6	block: Generalize chunk_sectors support as boundary support The purpose of the chunk_sectors limit is to ensure that a mergeble request fits within the boundary of the chunck_sector value. Such a feature will be useful for other request_queue boundary limits, so generalize the chunk_sectors merge code. This idea was proposed by Hannes Reinecke. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20240620125359.2684798-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 15:19:17 -06:00
John Garry	8d1dfd51c8	block: Pass blk_queue_get_max_sectors() a request pointer Currently blk_queue_get_max_sectors() is passed a enum req_op. In future the value returned from blk_queue_get_max_sectors() may depend on certain request flags, so pass a request pointer. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20240620125359.2684798-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 15:19:17 -06:00
Jens Axboe	e821bcecdf	Merge branch 'for-6.11/block-limits' into for-6.11/block Merge in queue limits cleanups. * for-6.11/block-limits: block: move the raid_partial_stripes_expensive flag into the features field block: remove the discard_alignment flag block: move the misaligned flag into the features field block: renumber and rename the cache disabled flag block: fix spelling and grammar for in writeback_cache_control.rst block: remove the unused blk_bounce enum	2024-06-20 06:55:20 -06:00
Christoph Hellwig	7d4dec525f	block: move the raid_partial_stripes_expensive flag into the features field Move the raid_partial_stripes_expensive flags into the features field to reclaim a little bit of space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240619154623.450048-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 06:53:15 -06:00
Christoph Hellwig	4cac3d3a71	block: remove the discard_alignment flag queue_limits.discard_alignment is never read except in the places where it is stacked into another limit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240619154623.450048-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 06:53:14 -06:00
Christoph Hellwig	5543217be4	block: move the misaligned flag into the features field Move the misaligned flags into the features field to reclaim a little bit of space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240619154623.450048-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 06:53:14 -06:00
Christoph Hellwig	bae1c74316	block: renumber and rename the cache disabled flag Start with the first bit, and drop the plural-S from the name. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240619154623.450048-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-20 06:53:14 -06:00
Jens Axboe	69c34f07e4	Merge branch 'for-6.11/block-limits' into for-6.11/block Merge in last round of queue limits changes from Christoph. * for-6.11/block-limits: (26 commits) block: move the bounce flag into the features field block: move the skip_tagset_quiesce flag to queue_limits block: move the pci_p2pdma flag to queue_limits block: move the zone_resetall flag to queue_limits block: move the zoned flag into the features field block: move the poll flag to queue_limits block: move the dax flag to queue_limits block: move the nowait flag to queue_limits block: move the synchronous flag to queue_limits block: move the stable_writes flag to queue_limits block: move the io_stat flag setting to queue_limits block: move the add_random flag to queue_limits block: move the nonrot flag to queue_limits block: move cache control settings out of queue->flags block: remove blk_flush_policy block: freeze the queue in queue_attr_store nbd: move setting the cache control flags to __nbd_set_size virtio_blk: remove virtblk_update_cache_mode loop: fold loop_update_rotational into loop_reconfigure_limits loop: also use the default block size from an underlying block device ... Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-19 08:14:49 -06:00
Christoph Hellwig	339d3948c0	block: move the bounce flag into the features field Move the bounce flag into the features field to reclaim a little bit of space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-19 07:58:28 -06:00
Christoph Hellwig	8c8f5c85b2	block: move the skip_tagset_quiesce flag to queue_limits Move the skip_tagset_quiesce flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-26-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-19 07:58:28 -06:00
Christoph Hellwig	9c1e42e3c8	block: move the pci_p2pdma flag to queue_limits Move the pci_p2pdma flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-19 07:58:28 -06:00
Christoph Hellwig	a52758a397	block: move the zone_resetall flag to queue_limits Move the zone_resetall flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-24-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-19 07:58:28 -06:00
Christoph Hellwig	b1fc937a55	block: move the zoned flag into the features field Move the zoned flags into the features field to reclaim a little bit of space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-23-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-19 07:58:28 -06:00

1 2 3 4 5 ...

7384 Commits