linux

Author	SHA1	Message	Date
Baolin Wang	7901601aef	blk-throttle: Avoid getting the current time if tg->last_finish_time is 0 We only update the tg->last_finish_time when the low limitaion is enabled, so we can move the tg->last_finish_time validation a little forward to avoid getting the unnecessary current time stamp if the the low limitation is not enabled. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-08 08:01:37 -06:00
Baolin Wang	4247d9c8ba	blk-throttle: Remove a meaningless parameter for throtl_downgrade_state() The throtl_downgrade_state() is always used to change to LIMIT_LOW limitation, thus remove the latter meaningless parameter which indicates the limitation index. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-08 08:01:37 -06:00
Baolin Wang	fa1c3eaf4d	block: Remove redundant 'return' statement Remove redundant 'return' statement for 'void' functions. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-08 07:59:48 -06:00
Johannes Thumshirn	fe6f0cdc49	block: soft limit zone-append sectors as well Martin rightfully noted that for normal filesystem IO we have soft limits in place, to prevent them from getting too big and not lead to unpredictable latencies. For zone append we only have the hardware limit in place. Cap the max sectors we submit via zone-append to the maximal number of sectors if the second limit is lower. Reported-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/linux-btrfs/yq1k0w8g3rw.fsf@ca-mkp.ca.oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-07 08:22:15 -06:00
Gabriel Krisman Bertazi	a926c7afff	block: Consider only dispatched requests for inflight statistic According to Documentation/block/stat.rst, inflight should not include I/O requests that are in the queue but not yet dispatched to the device, but blk-mq identifies as inflight any request that has a tag allocated, which, for queues without elevator, happens at request allocation time and before it is queued in the ctx (default case in blk_mq_submit_bio). In addition, current behavior is different for queues with elevator from queues without it, since for the former the driver tag is allocated at dispatch time. A more precise approach would be to only consider requests with state MQ_RQ_IN_FLIGHT. This effectively reverts commit `6131837b1d` ("blk-mq: count allocated but not started requests in iostats inflight") to consolidate blk-mq behavior with itself (elevator case) and with original documentation, but it differs from the behavior used by the legacy path. This version differs from v1 by using blk_mq_rq_state to access the state attribute. Avoid using blk_mq_request_started, which was suggested, since we don't want to include MQ_RQ_COMPLETE. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-06 14:36:35 -06:00
Christoph Hellwig	eda5cc997a	block: move blk_mq_sched_try_merge to blk-merge.c Move blk_mq_sched_try_merge to blk-merge.c, which allows to mark a lot of the merge infrastructure static there. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-06 07:29:53 -06:00
Christoph Hellwig	d59da41998	block: remove the unused blk_integrity_merge_bio export Also move the definition from the public blkdev.h to the private block/blk.h header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-06 07:29:53 -06:00
Christoph Hellwig	92cf2fd156	block: remove the unused blk_integrity_merge_rq export Also move the definition from the public blkdev.h to the private block/blk.h header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-06 07:29:53 -06:00
Ming Lei	0549e87c30	block: move 'q_usage_counter' into front of 'request_queue' The field of 'q_usage_counter' is always fetched in fast path of every block driver, and move it into front of 'request_queue', so it can be fetched into 1st cacheline of 'request_queue' instance. Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Veronika Kabatova <vkabatov@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-06 07:29:36 -06:00
Ming Lei	2b0d3d3e4f	percpu_ref: reduce memory footprint of percpu_ref in fast path 'struct percpu_ref' is often embedded into one user structure, and the instance is usually referenced in fast path, however actually only 'percpu_count_ptr' is needed in fast path. So move other fields into one new structure of 'percpu_ref_data', and allocate it dynamically via kzalloc(), then memory footprint of 'percpu_ref' in fast path is reduced a lot and becomes suitable to put into hot cacheline of user structure. Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Veronika Kabatova <vkabatov@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-06 07:29:36 -06:00
Eric Biggers	cf785af193	block: warn if !__GFP_DIRECT_RECLAIM in bio_crypt_set_ctx() bio_crypt_set_ctx() assumes its gfp_mask argument always includes __GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed. For now this assumption is still fine, since no callers violate it. Making bio_crypt_set_ctx() able to fail would add unneeded complexity. However, if a caller didn't use __GFP_DIRECT_RECLAIM, it would be very hard to notice the bug. Make it easier by adding a WARN_ON_ONCE(). Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Satya Tangirala <satyat@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Satya Tangirala <satyat@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-05 10:47:43 -06:00
Eric Biggers	93f221ae08	block: make blk_crypto_rq_bio_prep() able to fail blk_crypto_rq_bio_prep() assumes its gfp_mask argument always includes __GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed. However, blk_crypto_rq_bio_prep() might be called with GFP_ATOMIC via setup_clone() in drivers/md/dm-rq.c. This case isn't currently reachable with a bio that actually has an encryption context. However, it's fragile to rely on this. Just make blk_crypto_rq_bio_prep() able to fail. Suggested-by: Satya Tangirala <satyat@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Satya Tangirala <satyat@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-05 10:47:43 -06:00
Eric Biggers	07560151db	block: make bio_crypt_clone() able to fail bio_crypt_clone() assumes its gfp_mask argument always includes __GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed. However, bio_crypt_clone() might be called with GFP_ATOMIC via setup_clone() in drivers/md/dm-rq.c, or with GFP_NOWAIT via kcryptd_io_read() in drivers/md/dm-crypt.c. Neither case is currently reachable with a bio that actually has an encryption context. However, it's fragile to rely on this. Just make bio_crypt_clone() able to fail, analogous to bio_integrity_clone(). Reported-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Satya Tangirala <satyat@google.com> Cc: Satya Tangirala <satyat@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-05 10:47:43 -06:00
Christoph Hellwig	10ed16662d	block: add a bdget_part helper All remaining callers of bdget() outside of fs/block_dev.c want to get a reference to the struct block_device for a given struct hd_struct. Add a helper just for that and then mark bdget static. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-05 10:38:33 -06:00
Christoph Hellwig	155bd9d1ab	drbd: remove ->this_bdev DRBD keeps a block device open just to get and set the capacity from it. Switch to primarily using the disk capacity as intended by the block layer, and sync it to the bdev using revalidate_disk_size. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-05 10:38:33 -06:00
yangerkun	76cffccd60	block-mq: fix comments in blk_mq_queue_tag_busy_iter 'f5bbbbe4d635 ("blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter")' introduce a bug what we may sleep between rcu lock. Then '530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter callbacks")' fix it by get request_queue's ref. And 'a9a808084d6a ("block: Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues()")' remove the synchronize_rcu in __blk_mq_update_nr_hw_queues. We need update the confused comments in blk_mq_queue_tag_busy_iter. Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-29 08:11:00 -06:00
Xianting Tian	8229cca8c3	blk-mq: add cond_resched() in __blk_mq_alloc_rq_maps() We found blk_mq_alloc_rq_maps() takes more time in kernel space when testing nvme device hot-plugging. The test and anlysis as below. Debug code, 1, blk_mq_alloc_rq_maps(): u64 start, end; depth = set->queue_depth; start = ktime_get_ns(); pr_err("[%d:%s switch:%ld,%ld] queue depth %d, nr_hw_queues %d\n", current->pid, current->comm, current->nvcsw, current->nivcsw, set->queue_depth, set->nr_hw_queues); do { err = __blk_mq_alloc_rq_maps(set); if (!err) break; set->queue_depth >>= 1; if (set->queue_depth < set->reserved_tags + BLK_MQ_TAG_MIN) { err = -ENOMEM; break; } } while (set->queue_depth); end = ktime_get_ns(); pr_err("[%d:%s switch:%ld,%ld] all hw queues init cost time %lld ns\n", current->pid, current->comm, current->nvcsw, current->nivcsw, end - start); 2, __blk_mq_alloc_rq_maps(): u64 start, end; for (i = 0; i < set->nr_hw_queues; i++) { start = ktime_get_ns(); if (!__blk_mq_alloc_rq_map(set, i)) goto out_unwind; end = ktime_get_ns(); pr_err("hw queue %d init cost time %lld ns\n", i, end - start); } Test nvme hot-plugging with above debug code, we found it totally cost more than 3ms in kernel space without being scheduled out when alloc rqs for all 16 hw queues with depth 1023, each hw queue cost about 140-250us. The cost time will be increased with hw queue number and queue depth increasing. And in an extreme case, if __blk_mq_alloc_rq_maps() returns -ENOMEM, it will try "queue_depth >>= 1", more time will be consumed. [ 428.428771] nvme nvme0: pci function 10000:01:00.0 [ 428.428798] nvme 10000:01:00.0: enabling device (0000 -> 0002) [ 428.428806] pcieport 10000:00:00.0: can't derive routing for PCI INT A [ 428.428809] nvme 10000:01:00.0: PCI INT A: no GSI [ 432.593374] [4688:kworker/u33:8 switch:663,2] queue depth 30, nr_hw_queues 1 [ 432.593404] hw queue 0 init cost time 22883 ns [ 432.593408] [4688:kworker/u33:8 switch:663,2] all hw queues init cost time 35960 ns [ 432.595953] nvme nvme0: 16/0/0 default/read/poll queues [ 432.595958] [4688:kworker/u33:8 switch:700,2] queue depth 1023, nr_hw_queues 16 [ 432.596203] hw queue 0 init cost time 242630 ns [ 432.596441] hw queue 1 init cost time 235913 ns [ 432.596659] hw queue 2 init cost time 216461 ns [ 432.596877] hw queue 3 init cost time 215851 ns [ 432.597107] hw queue 4 init cost time 228406 ns [ 432.597336] hw queue 5 init cost time 227298 ns [ 432.597564] hw queue 6 init cost time 224633 ns [ 432.597785] hw queue 7 init cost time 219954 ns [ 432.597937] hw queue 8 init cost time 150930 ns [ 432.598082] hw queue 9 init cost time 143496 ns [ 432.598231] hw queue 10 init cost time 147261 ns [ 432.598397] hw queue 11 init cost time 164522 ns [ 432.598542] hw queue 12 init cost time 143401 ns [ 432.598692] hw queue 13 init cost time 148934 ns [ 432.598841] hw queue 14 init cost time 147194 ns [ 432.598991] hw queue 15 init cost time 148942 ns [ 432.598993] [4688:kworker/u33:8 switch:700,2] all hw queues init cost time 3035099 ns [ 432.602611] nvme0n1: p1 So use this patch to trigger schedule between each hw queue init, to avoid other threads getting stuck. It is not in atomic context when executing __blk_mq_alloc_rq_maps(), so it is safe to call cond_resched(). Signed-off-by: Xianting Tian <tian.xianting@h3c.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-28 09:01:51 -06:00
Tejun Heo	bec02dbbaf	iocost: consider iocgs with active delays for debt forgiveness An iocg may have 0 debt but non-zero delay. The current debt forgiveness logic doesn't act on such iocgs. This can lead to unexpected behaviors - an iocg with a little bit of debt will have its delay canceled through debt forgiveness but one w/o any debt but active delay will have to wait out until its delay decays out. This patch updates the debt handling logic so that it treats delays the same as debts. If either debt or delay is active, debt forgiveness logic kicks in and acts on both the same way. Also, avoid turning the debt and delay directly to zero as that can confuse state transitions. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:35:02 -06:00
Tejun Heo	c5a6561b8d	iocost: add iocg_forgive_debt tracepoint Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:35:02 -06:00
Tejun Heo	c7af2a003a	iocost: reimplement debt forgiveness using average usage Debt forgiveness logic was counting the number of consecutive !busy periods as the trigger condition. While this usually works, it can easily be thrown off by temporary fluctuations especially on configurations w/ short periods. This patch reimplements debt forgiveness so that: * Use the average usage over the forgiveness period instead of counting consecutive periods. * Debt is reduced at around the target rate (1/2 every 100ms) regardless of ioc period duration. * Usage threshold is raised to 50%. Combined with the preceding changes and the switch to average usage, this makes debt forgivness a lot more effective at reducing the amount of unnecessary idleness. * Constants are renamed with DFGV_ prefix. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:35:02 -06:00
Tejun Heo	d95178410b	iocost: recalculate delay after debt reduction Debt sets the initial delay duration which is decayed over time. The current debt reduction halved the debt but didn't change the delay. It prevented future debts from increasing delay but didn't do anything to lower the existing delay, limiting the mechanism's ability to reduce unnecessary idling. Reset iocg->delay to 0 after debt reduction so that iocg_kick_waitq() recalculates new delay value based on the reduced debt amount. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:35:02 -06:00
Tejun Heo	33a1fe6d82	iocost: replace nr_shortages cond in ioc_forgive_debts() with busy_level one Debt reduction was blocked if any iocg was short on budget in the past period to avoid reducing debts while some iocgs are saturated. However, this ends up unnecessarily blocking debt reduction due to temporary local imbalances when the device is generally being underutilized, while also failing to block when the underlying device is overwhelmed and the usage becomes low from high latency. Given that debt accumulation mostly happens with swapout bursts which can significantly deteriorate the underlying device's latency response, the current logic is not great. Let's replace it with ioc->busy_level based condition so that we block debt reduction when the underlying device is being saturated. ioc_forgive_debts() call is moved after busy_level determination. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:35:02 -06:00
Tejun Heo	ab8df828b5	iocost: factor out ioc_forgive_debts() Debt reduction logic is going to be improved and expanded. Factor it out into ioc_forgive_debts() and generalize the comment a bit. No functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:35:02 -06:00
Konstantin Khlebnikov	6abc49468e	dm: add support for REQ_NOWAIT and enable it for linear target Add DM target feature flag DM_TARGET_NOWAIT which advertises that target works with REQ_NOWAIT bios. Add dm_table_supports_nowait() and update dm_table_set_restrictions() to set/clear QUEUE_FLAG_NOWAIT accordingly. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:20:03 -06:00
Mike Snitzer	021a24460d	block: add QUEUE_FLAG_NOWAIT Add QUEUE_FLAG_NOWAIT to allow a block device to advertise support for REQ_NOWAIT. Bio-based devices may set QUEUE_FLAG_NOWAIT where applicable. Update QUEUE_FLAG_MQ_DEFAULT to include QUEUE_FLAG_NOWAIT. Also update submit_bio_checks() to verify it is set for REQ_NOWAIT bios. Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:20:03 -06:00
Christoph Hellwig	700cd59db5	vsprintf: use bd_partno in bdev_name No need to go through the hd_struct to find the partition number. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:18:58 -06:00
Christoph Hellwig	8a63a86e1f	block: use bd_partno in bdevname No need to go through the hd_struct to find the partition number. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:18:58 -06:00
Christoph Hellwig	57ba105920	target/iblock: fix holder printing in iblock_show_configfs_dev_params bd_contains is never NULL for an open block device. In addition ibd_bd is always set to a block device that was exclusively opened by the target code, so the holder is guranteed to be ib_dev as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:18:58 -06:00
Christoph Hellwig	74f9445409	drbd: don't set ->bd_contains The ->bd_contains field is set by __blkdev_get and drivers have no business manipulating it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:18:58 -06:00
Christoph Hellwig	8c40c7c483	drbd: don't detour through bd_contains for the gendisk bd_disk is set on all block devices, including those for partitions. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:18:58 -06:00
Christoph Hellwig	4245e52d25	md: don't detour through bd_contains for the gendisk bd_disk is set on all block devices, including those for partitions. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:18:57 -06:00
Christoph Hellwig	61a27e1f52	md: compare bd_disk instead of bd_contains To check for partitions of the same disk bd_contains works as well, but bd_disk is way more obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:18:57 -06:00
Christoph Hellwig	fa01b1e973	block: add a bdev_is_partition helper Add a littler helper to make the somewhat arcane bd_contains checks a little more obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:18:57 -06:00
Christoph Hellwig	250eec9e39	Documentation/hdio: fix up obscure bd_contains references bd_contains is an implementation detail and should not be mentioned in a userspace API documentation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-25 08:18:57 -06:00
Christoph Hellwig	f56753ac2a	bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag Replace the two negative flags that are always used together with a single positive flag that indicates the writeback capability instead of two related non-capabilities. Also remove the pointless wrappers to just check the flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:39 -06:00
Christoph Hellwig	823423ef55	bdi: invert BDI_CAP_NO_ACCT_WB Replace BDI_CAP_NO_ACCT_WB with a positive BDI_CAP_WRITEBACK_ACCT to make the checks more obvious. Also remove the pointless bdi_cap_account_writeback wrapper that just obsfucates the check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:39 -06:00
Christoph Hellwig	1cb039f3dc	bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag The BDI_CAP_STABLE_WRITES is one of the few bits of information in the backing_dev_info shared between the block drivers and the writeback code. To help untangling the dependency replace it with a queue flag and a superblock flag derived from it. This also helps with the case of e.g. a file system requiring stable writes due to its own checksumming, but not forcing it on other users of the block device like the swap code. One downside is that we an't support the stable_pages_required bdi attribute in sysfs anymore. It is replaced with a queue attribute which also is writable for easier testing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:39 -06:00
Christoph Hellwig	5115db10a8	mm: use SWP_SYNCHRONOUS_IO more intelligently There is no point in trying to call bdev_read_page if SWP_SYNCHRONOUS_IO is not set, as the device won't support it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:39 -06:00
Christoph Hellwig	a8b456d01c	bdi: remove BDI_CAP_SYNCHRONOUS_IO BDI_CAP_SYNCHRONOUS_IO is only checked in the swap code, and used to decided if ->rw_page can be used on a block device. Just check up for the method instead. The only complication is that zram needs a second set of block_device_operations as it can switch between modes that actually support ->rw_page and those who don't. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:39 -06:00
Christoph Hellwig	ed7b6b4f6e	bdi: remove BDI_CAP_CGROUP_WRITEBACK Just checking SB_I_CGROUPWB for cgroup writeback support is enough. Either the file system allocates its own bdi (e.g. btrfs), in which case it is known to support cgroup writeback, or the bdi comes from the block layer, which always supports cgroup writeback. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:39 -06:00
Christoph Hellwig	c2e4cd57cf	block: lift setting the readahead size into the block layer Drivers shouldn't really mess with the readahead size, as that is a VM concept. Instead set it based on the optimal I/O size by lifting the algorithm from the md driver when registering the disk. Also set bdi->io_pages there as well by applying the same scheme based on max_sectors. To ensure the limits work well for stacking drivers a new helper is added to update the readahead limits from the block limits, which is also called from disk_stack_limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:39 -06:00
Christoph Hellwig	16ef510139	md: update the optimal I/O size on reshape The raid5 and raid10 drivers currently update the read-ahead size, but not the optimal I/O size on reshape. To prepare for deriving the read-ahead size from the optimal I/O size make sure it is updated as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:39 -06:00
Christoph Hellwig	55b2598e84	bdi: initialize ->ra_pages and ->io_pages in bdi_init Set up a readahead size by default, as very few users have a good reason to change it. This means code, ecryptfs, and orangefs now set up the values while they were previously missing it, while ubifs, mtd and vboxsf manually set it to 0 to avoid readahead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Richard Weinberger <richard@nod.at> [ubifs, mtd] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:39 -06:00
Christoph Hellwig	9e82d35b95	aoe: set an optimal I/O size aoe forces a larger readahead size, but any reason to do larger I/O is not limited to readahead. Also set the optimal I/O size, and remove the local constants in favor of just using SZ_2G. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:38 -06:00
Christoph Hellwig	5d4ce78b25	bcache: inherit the optimal I/O size Inherit the optimal I/O size setting just like the readahead window, as any reason to do larger I/O does not apply to just readahead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:38 -06:00
Christoph Hellwig	b807a2c5e0	drbd: remove dead code in device_to_statistics Ever since the switch to blk-mq, a lower device not used for VM writeback will not be marked congested, so the check will never trigger. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:38 -06:00
Christoph Hellwig	402dd2cf46	fs: remove the unused SB_I_MULTIROOT flag The last user of SB_I_MULTIROOT is disappeared with commit `f2aedb713c` ("NFS: Add fs_context support.") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-24 13:43:38 -06:00
Christoph Hellwig	1fb1a2ad75	block: mark blkdev_get static There are no users outside the core block code left now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-23 10:43:19 -06:00
Christoph Hellwig	36daaa98f7	PM: mm: cleanup swsusp_swap_check Use blkdev_get_by_dev instead of bdget + blkdev_get. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-23 10:43:19 -06:00
Christoph Hellwig	21bd900572	mm: split swap_type_of swap_type_of is used for two entirely different purposes: (1) check what swap type a given device/offset corresponds to (2) find the first available swap device that can be written to Mixing both in a single function creates an unreadable mess. Create two separate functions instead, and switch both to pass a dev_t instead of a struct block_device to further simplify the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-09-23 10:43:19 -06:00

1 2 3 4 5 ...

949822 Commits