linux

Author	SHA1	Message	Date
Ilya Dryomov	8ae0299a4b	rbd: don't mess with a page vector in rbd_notify_op_lock() rbd_notify_op_lock() isn't interested in a notify reply. Instead of accepting that page vector just to free it, have watch-notify code take care of it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>	2020-04-13 08:55:49 +02:00
Ilya Dryomov	b877605152	rbd: don't test rbd_dev->opts in rbd_dev_image_release() rbd_dev->opts is used to distinguish between the image that is being mapped and a parent. However, because we no longer establish watch for read-only mappings, this test is imprecise and results in unnecessary rbd_unregister_watch() calls. Make it consistent with need_watch in rbd_dev_image_probe(). Fixes: `b9ef2b8858` ("rbd: don't establish watch for read-only mappings") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>	2020-04-13 08:55:49 +02:00
Ilya Dryomov	952c48b0ed	rbd: call rbd_dev_unprobe() after unwatching and flushing notifies rbd_dev_unprobe() is supposed to undo most of rbd_dev_image_probe(), including rbd_dev_header_info(), which means that rbd_dev_header_info() isn't supposed to be called after rbd_dev_unprobe(). However, rbd_dev_image_release() calls rbd_dev_unprobe() before rbd_unregister_watch(). This is racy because a header update notify can sneak in: "rbd unmap" thread ceph-watch-notify worker rbd_dev_image_release() rbd_dev_unprobe() free and zero out header rbd_watch_cb() rbd_dev_refresh() rbd_dev_header_info() read in header The same goes for "rbd map" because rbd_dev_image_probe() calls rbd_dev_unprobe() on errors. In both cases this results in a memory leak. Fixes: `fd22aef8b4` ("rbd: move rbd_unregister_watch() call into rbd_dev_image_release()") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>	2020-04-13 08:55:49 +02:00
Ilya Dryomov	0e4e1de5b6	rbd: avoid a deadlock on header_rwsem when flushing notifies rbd_unregister_watch() flushes notifies and therefore cannot be called under header_rwsem because a header update notify takes header_rwsem to synchronize with "rbd map". If mapping an image fails after the watch is established and a header update notify sneaks in, we deadlock when erroring out from rbd_dev_image_probe(). Move watch registration and unregistration out of the critical section. The only reason they were put there was to make header_rwsem management slightly more obvious. Fixes: `811c668877` ("rbd: fix rbd map vs notify races") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>	2020-04-13 08:55:49 +02:00
Linus Torvalds	e6383b185a	xen: branch for v5.7-rc1b -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXpAQNgAKCRCAXGG7T9hj voLNAP9VWlSX7Whn4o9fndit2HyqDpOo7fQKiuU4XtDd++FG6QD/Zcu201B8ZP8M rkbeFthX+W9PAyZ0itf1vCL4fQoR7gw= =pRJH -----END PGP SIGNATURE----- Merge tag 'for-linus-5.7-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: - two cleanups - fix a boot regression introduced in this merge window - fix wrong use of memory allocation flags * tag 'for-linus-5.7-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: fix booting 32-bit pv guest x86/xen: make xen_pvmmu_arch_setup() static xen/blkfront: fix memory allocation flags in blkfront_setup_indirect() xen: Use evtchn_type_t as a type for event channels	2020-04-10 17:20:06 -07:00
Linus Torvalds	8df2a0a6da	block-5.7-2020-04-10 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6QhDIQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpsE/EADOQ0xDMOa8EmzRvjuCkiaB9yK2zXiBSAj5 ZBi7ReownfXhCR7nVc8Bv1s2f00PD6CFNURXZmdgyDDrXEd2ojueDoAZNBk59t0e i2CAF2wLAQ5EfuVaxSHVEOrVEmtu+ue+Ix83JNlnGPd7pf9s7uKc/W4iKGpgpxIo 1CpXmWwm5RwjX4z/Qsiaka2lB7QojjImp1n3C+XI5+pp/bJXiftep1lxH5Y3nSWU iR4jO81uxDMxhTEZ9z2cb1HarhctKvnihcb39gQYQ/kYYu7hSZnBPZo5zp5Dyb/t 4tGuDsfXCQCbF0smkusUrcyeT19vh9tOsGkiMzJ/ihm7TMyN4fT23h6DUb/7pAON jnlcB7r5Ezs8jLz9i+mAoq06djd5u54kiuKFog8170sTrtYsncZbyc01wLNAla/V /6KX1sMbPlbXZ+a3l3i7i/gcCBJ7ci6pV3x2elvM9dKHxyqJmwEGMlFVwt4s26ev wS+7+dktLAC73889Zyn8LutA4bWy5FmisSPA4PydSUSOZA+7JjlbILcz15jjwlP2 HzYk+TXsd3yJUQRYX5P0FcDaBUTISr/xeUUB+KT1rLv4Lhtso+S/9cvSc8x5mOa9 989gmqNfFAWoj1nKEIKeRwLjk0b6YA9qMv4jOwwiuobsT55aBxpbP80huNoRVj5L xFIWgBSwzg== =3woC -----END PGP SIGNATURE----- Merge tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Here's a set of fixes that should go into this merge window. This contains: - NVMe pull request from Christoph with various fixes - Better discard support for loop (Evan) - Only call ->commit_rqs() if we have queued IO (Keith) - blkcg offlining fixes (Tejun) - fix (and fix the fix) for busy partitions" * tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block: block: fix busy device checking in blk_drop_partitions again block: fix busy device checking in blk_drop_partitions nvmet-rdma: fix double free of rdma queue blk-mq: don't commit_rqs() if none were queued nvme-fc: Revert "add module to ops template to allow module references" nvme: fix deadlock caused by ANA update wrong locking nvmet-rdma: fix bonding failover possible NULL deref loop: Better discard support for block devices loop: Report EOPNOTSUPP properly nvmet: fix NULL dereference when removing a referral nvme: inherit stable pages constraint in the mpath stack device blkcg: don't offline parent blkcg first blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it nvme-tcp: fix possible crash in recv error flow nvme-tcp: don't poll a non-live queue nvme-tcp: fix possible crash in write_zeroes processing nvmet-fc: fix typo in comment nvme-rdma: Replace comma with a semicolon nvme-fcloop: fix deallocation of working context nvme: fix compat address handling in several ioctls	2020-04-10 10:06:54 -07:00
Linus Torvalds	fcc95f0640	The main items are: - support for asynchronous create and unlink (Jeff Layton). Creates and unlinks are satisfied locally, without waiting for a reply from the MDS, provided the client has been granted appropriate caps (new in v15.y.z ("Octopus") release). This can be a big help for metadata heavy workloads such as tar and rsync. Opt-in with the new nowsync mount option. - multiple blk-mq queues for rbd (Hannes Reinecke and myself). When the driver was converted to blk-mq, we settled on a single blk-mq queue because of a global lock in libceph and some other technical debt. These have since been addressed, so allocate a queue per CPU to enhance parallelism. - don't hold onto caps that aren't actually needed (Zheng Yan). This has been our long-standing behavior, but it causes issues with some active/standby applications (synchronous I/O, stalls if the standby goes down, etc). - .snap directory timestamps consistent with ceph-fuse (Luis Henriques) -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl6OEO4THGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi0XNB/wItYipkjlL5fIUBqiRWzYai72DWdPp CnOZo8LB+O0MQDomPT6DdpU1OMlWZi5HF7zklrZ35LTm21UkRNC9zvccjs9l66PJ qo9cKJbxxju+hgzIvgvK9PjlDlaiFAc/pkF8lZ/NaOnSsM1vvsFL9IuY2LXS38MY A/uUTZNUnFy5udam8TPuN+gWwZcUIH48lRWQLWe2I/hNJSweX1l8OHvecOBg+cYH G+8vb7mLU2V9ky0YT5JJmVxUV3CWA5wH6ZrWWy1ofVDdeSFLPrhgWX6IMjaNq+Gd xPfxmly47uBviSqON9dMkiThgy0Qj7yi0Pvx+1sAZbD7aj/6A4qg3LX5 =GIX0 -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "The main items are: - support for asynchronous create and unlink (Jeff Layton). Creates and unlinks are satisfied locally, without waiting for a reply from the MDS, provided the client has been granted appropriate caps (new in v15.y.z ("Octopus") release). This can be a big help for metadata heavy workloads such as tar and rsync. Opt-in with the new nowsync mount option. - multiple blk-mq queues for rbd (Hannes Reinecke and myself). When the driver was converted to blk-mq, we settled on a single blk-mq queue because of a global lock in libceph and some other technical debt. These have since been addressed, so allocate a queue per CPU to enhance parallelism. - don't hold onto caps that aren't actually needed (Zheng Yan). This has been our long-standing behavior, but it causes issues with some active/standby applications (synchronous I/O, stalls if the standby goes down, etc). - .snap directory timestamps consistent with ceph-fuse (Luis Henriques)" * tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client: (49 commits) ceph: fix snapshot directory timestamps ceph: wait for async creating inode before requesting new max size ceph: don't skip updating wanted caps when cap is stale ceph: request new max size only when there is auth cap ceph: cleanup return error of try_get_cap_refs() ceph: return ceph_mdsc_do_request() errors from __get_parent() ceph: check all mds' caps after page writeback ceph: update i_requested_max_size only when sending cap msg to auth mds ceph: simplify calling of ceph_get_fmode() ceph: remove delay check logic from ceph_check_caps() ceph: consider inode's last read/write when calculating wanted caps ceph: always renew caps if mds_wanted is insufficient ceph: update dentry lease for async create ceph: attempt to do async create when possible ceph: cache layout in parent dir on first sync create ceph: add new MDS req field to hold delegated inode number ceph: decode interval_sets for delegated inos ceph: make ceph_fill_inode non-static ceph: perform asynchronous unlink if we have sufficient caps ceph: don't take refs to want mask unless we have all bits ...	2020-04-08 21:44:05 -07:00
Juergen Gross	3a169c0be7	xen/blkfront: fix memory allocation flags in blkfront_setup_indirect() Commit `1d5c76e664` ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation") didn't fix the issue it was meant to, as the flags for allocating the memory are GFP_NOIO, which will lead the memory allocation falling back to kmalloc(). So instead of GFP_NOIO use GFP_KERNEL and do all the memory allocation in blkfront_setup_indirect() in a memalloc_noio_{save,restore} section. Fixes: `1d5c76e664` ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation") Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20200403090034.8753-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>	2020-04-07 12:12:58 +02:00
Evan Green	c52abf5630	loop: Better discard support for block devices If the backing device for a loop device is itself a block device, then mirror the "write zeroes" capabilities of the underlying block device into the loop device. Copy this capability into both max_write_zeroes_sectors and max_discard_sectors of the loop device. The reason for this is that REQ_OP_DISCARD on a loop device translates into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This presents a consistent interface for loop devices (that discarded data is zeroed), regardless of the backing device type of the loop device. There should be no behavior change for loop devices backed by regular files. This change fixes blktest block/003, and removes an extraneous error print in block/013 when testing on a loop device backed by a block device that does not support discard. Signed-off-by: Evan Green <evgreen@chromium.org> Reviewed-by: Gwendal Grignou <gwendal@chromium.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [used updated version of Evan's comment in loop_config_discard()] [moved backingq to local scope, removed redundant braces] Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-03 13:44:22 -06:00
Evan Green	8cd55087dc	loop: Report EOPNOTSUPP properly Properly plumb out EOPNOTSUPP from loop driver operations, which may get returned when for instance a discard operation is attempted but not supported by the underlying block device. Before this change, everything was reported in the log as an I/O error, which is scary and not helpful in debugging. Signed-off-by: Evan Green <evgreen@chromium.org> Reviewed-by: Gwendal Grignou <gwendal@chromium.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-04-03 13:44:20 -06:00
Linus Torvalds	1592614838	for-5.7/drivers-2020-03-29 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6BJDYQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgplhMD/95jd4nlVetHAo54z+Zk2ExE13+yDamRKyh vc7t2tz1reqFOimtVr5aVuTXCTgOx4CpiIox5qcn6qAExN4JtCChOBRGize/0u8S ckxnhHbN2C0rfnGldvrYYeNRonFI+7QKimnurWUSYYGN0xqbo21BxJ7dFaohMseo q4K8sIW0ctE6AOlw28Jerkg614s2NDGZ7q1laheXnYHn5c9f1m0NaKN/jyTGgr0X TLBiLbX2yRrAuvpctBj6Fna6YN7Vdd9jsf2Bt6ipUI1XgHQoVUGMxQNhWPyjsbSv GzRQUNAfVcasLzCP/Mj/47144OkUtDDpn2mjeXDaFljLDGFULD+jp/SsOmLCxkPC gI7G2yfBvF96/SOyT0JXrLyMcBd1R2vRoASbc5tPu82mZhx7YJZH5WYtOB9h2gra RTYo3xcm0EoN6yeMaH+xOuXxTWWInIrgKPONW4H8s7hxEiMt5oFNVBI7vqPr4LVp tpfxiKZDavKOofKXogNV4W7mSMP/Ir5Q9Ha4g5SXHBGp0z/PHmnQ0xDGNq0KDnU4 eNO0UYCFNCNa+0AOhpNxaVuVm9LjrgvyXRjePgOZQ4akhohwHO6DLrHK1f8Hb1vD 8Ih6uR+F5zZlKsouWro8HLGYm5w40Wq9tbCI8QbPYH6nkGoDmzpPv9jbAeWgJU5c KqP/5TBSLA== =Bs4E -----END PGP SIGNATURE----- Merge tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: - floppy driver cleanup series from Willy - NVMe updates and fixes (Various) - null_blk trace improvements (Chaitanya) - bcache fixes (Coly) - md fixes (via Song) - loop block size change optimizations (Martijn) - scnprintf() use (Takashi) * tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits) null_blk: add trace in null_blk_zoned.c null_blk: add tracepoint helpers for zoned mode block: add a zone condition debug helper nvme: cleanup namespace identifier reporting in nvme_init_ns_head nvme: rename __nvme_find_ns_head to nvme_find_ns_head nvme: refactor nvme_identify_ns_descs error handling nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl nvme: Fix controller creation races with teardown flow nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl nvme: Fix ctrl use-after-free during sysfs deletion nvme-pci: Re-order nvme_pci_free_ctrl nvme: Remove unused return code from nvme_delete_ctrl_sync nvme: Use nvme_state_terminal helper nvme: release ida resources nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO nvmet-tcp: optimize tcp stack TX when data digest is used nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow nvme-multipath: do not reset on unknown status nvmet-rdma: allocate RW ctxs according to mdts ...	2020-03-30 11:43:51 -07:00
Linus Torvalds	10f36b1e80	for-5.7/block-2020-03-29 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6BJCoQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpvziEACqQC+QRKiqR6X5yaPWJ9LqjKE7lfI1PUb7 0a1z1mKuf8d6z0qNleUwdSOEaS5zJiswou2K8GLvEtTQH41QYsQkxc9GLjAyTveK szAyzZaa3BNUy9hkczm9i2arv3fI8XoTE3JvRM0e9wL8fBJDYCtKtHFJvF4hisOQ ydaJlU6tcwzd9bdV7K5dLwBxu3AeAJjzS3Tyfw25u9N9O/btUxJ91RTqBb2+Xeoz AVasfRlAqf/CzdjxCCmDgWE2QM4852pAeQ7UJJBGISNWNoiwkezMg+6HD0jEOLee bQ8uDyQdihIWTY+/zQasotX8/71uLV8QgtjWLXR9zrjrubIBWHGzoWSQ4kPg5DfQ bJmKO0VvWN2sshZEpWvzzAFGYxZViNphbK2Pb4hKOcv7jtMcC8mmEogh/7EqbD/n KB3IM9qVoXM8INm5o0dTy5uDRJxiHiHYkqsZaKz55BB/R4Geym5TINT3nXgxhQrn JoSwp4zdm3/NJOySruDi2eETqWJC2bsz3FsQSyCQTPOuP0nLtFKBb1UKHpmYTCXG H4LCyCKFJ6s006qBcdaNPZBw1mrSNwoxEulHnpYA4BFfPeXi72yrnMZQkdwWONpW LIVuD0hBm8X/pulbvEEdjzXBqZVkqK3xFX+uX5+bnwwaUKddXAC/h9SQKpBP2Mbb AeZToMklKw== =6Glq -----END PGP SIGNATURE----- Merge tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: - Online capacity resizing (Balbir) - Number of hardware queue change fixes (Bart) - null_blk fault injection addition (Bart) - Cleanup of queue allocation, unifying the node/no-node API (Christoph) - Cleanup of genhd, moving code to where it makes sense (Christoph) - Cleanup of the partition handling code (Christoph) - disk stat fixes/improvements (Konstantin) - BFQ improvements (Paolo) - Various fixes and improvements * tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block: (72 commits) block: return NULL in blk_alloc_queue() on error block: move bio_map_* to blk-map.c Revert "blkdev: check for valid request queue before issuing flush" block: simplify queue allocation bcache: pass the make_request methods to blk_queue_make_request null_blk: use blk_mq_init_queue_data block: add a blk_mq_init_queue_data helper block: move the ->devnode callback to struct block_device_operations block: move the part_stat* helpers from genhd.h to a new header block: move block layer internals out of include/linux/genhd.h block: move guard_bio_eod to bio.c block: unexport get_gendisk block: unexport disk_map_sector_rcu block: unexport disk_get_part block: mark part_in_flight and part_in_flight_rw static block: mark block_depr static block: factor out requeue handling from dispatch code block/diskstats: replace time_in_queue with sum of request times block/diskstats: accumulate all per-cpu counters in one pass block/diskstats: more accurate approximation of io_ticks for slow disks ...	2020-03-30 11:20:13 -07:00
Hannes Reinecke	f9b6b98d24	rbd: enable multiple blk-mq queues Allocate one queue per CPU and get a performance boost from higher parallelism. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Ilya Dryomov	59e542c869	rbd: embed image request in blk-mq pdu Avoid making allocations for !IMG_REQ_CHILD image requests. Only IMG_REQ_CHILD image requests need to be freed now. Move the initial request checks to rbd_queue_rq(). Unfortunately we can't fill the image request and kick the state machine directly from rbd_queue_rq() because ->queue_rq() isn't allowed to block. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Ilya Dryomov	a52cc68575	rbd: acquire header_rwsem just once in rbd_queue_workfn() Currently header_rwsem is acquired twice: once in rbd_dev_parent_get() when the image request is being created and then in rbd_queue_workfn() to capture mapping_size and snapc. Introduce rbd_img_capture_header() and move image request allocation so that header_rwsem can be acquired just once. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Ilya Dryomov	78b42a871a	rbd: get rid of img_request_layered_clear() No need to clear IMG_REQ_LAYERED before destroying the request. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Hannes Reinecke	679a97d286	rbd: kill img_request kref The reference counter is never increased, so we can as well call rbd_img_request_destroy() directly and drop the kref. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Ilya Dryomov	94f4857f4b	rbd: remove barriers from img_request_layered_{set,clear,test}() IMG_REQ_LAYERED is set in rbd_img_request_create(), and tested and cleared in rbd_img_request_destroy() when the image request is about to be destroyed. The barriers are unnecessary. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2020-03-30 12:42:40 +02:00
Chaitanya Kulkarni	766c3297d7	null_blk: add trace in null_blk_zoned.c With the help of previously added tracepoints we can now trace report-zones, zone-write and zone-mgmt ops in null_blk_zoned.c. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-27 13:39:10 -06:00
Chaitanya Kulkarni	c51d041998	null_blk: add tracepoint helpers for zoned mode This patch adds two new tracpoints for null_blk_zoned.c that allows us to trace report-zones, zone-mgmt-op and zone-write operations which has direct effect on the zone condition state machine. Also, we update drivers/block/Makefile so that new null_blk related tracefiles can be compiled. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-27 13:39:10 -06:00
Christoph Hellwig	3d745ea5b0	block: simplify queue allocation Current make_request based drivers use either blk_alloc_queue_node or blk_alloc_queue to allocate a queue, and then set up the make_request_fn function pointer and a few parameters using the blk_queue_make_request helper. Simplify this by passing the make_request pointer to blk_alloc_queue, and while at it merge the _node variant into the main helper by always passing a node_id, and remove the superfluous gfp_mask parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-27 10:23:43 -06:00
Christoph Hellwig	8d96a1117c	null_blk: use blk_mq_init_queue_data Use the new blk_mq_init_queue_data instead of open coding the queue allocation and initialization. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-27 10:23:43 -06:00
Christoph Hellwig	348e114bbd	block: move the ->devnode callback to struct block_device_operations There really isn't any good reason to stash a method directly into struct gendisk. Move it together with the other block device operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-27 09:50:05 -06:00
Christoph Hellwig	c6a564ffad	block: move the part_stat* helpers from genhd.h to a new header These macros are just used by a few files. Move them out of genhd.h, which is included everywhere into a new standalone header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-25 09:50:09 -06:00
Gustavo A. R. Silva	431d6e3eec	rsxx: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertenly introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-19 15:26:45 -06:00
Balbir Singh	3cbc28bb90	xen-blkfront.c: Convert to use set_capacity_revalidate_and_notify block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE notifications via uevents. Signed-off-by: Balbir Singh <sblbir@amazon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-18 15:13:21 -06:00
Balbir Singh	662155e289	virtio_blk.c: Convert to use set_capacity_revalidate_and_notify block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE notifications via uevents. Signed-off-by: Balbir Singh <sblbir@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-18 15:13:21 -06:00
Willy Tarreau	e83995c9f8	floppy: rename the global "fdc" variable to "current_fdc" This is done in order to remove the confusion that arises at some places in the code where local variables or arguments shadow the global variable. It is already visible that some places are a bit awkward and iterate over the global variable, for the sole reason that they used to rely on it being named "fdc" in order to get the correct address when using FD_DOR. These ones are easy to spot by searching for "for (current_fdc...". Some more cleanup is definitely possible. For example "fdc_state[current_fdc].somefield" is used all over the code and would probably be better with "fdc_state->somefield" with fdc_state being set when current_fdc is assigned. This would require to pass the pointer to the current state instead of the current_fdc to the I/O functions. Link: https://lore.kernel.org/r/20200301195555.11154-7-w@1wt.eu Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:58 -06:00
Willy Tarreau	e2032464fe	floppy: separate the FDC's base address from its registers FDC registers FD_STATUS, FD_DATA, FD_DOR, FD_DIR and FD_DCR used to be defined relative to FD_IOPORT, which is the FDC's base address, itself a macro depending on the "fdc" local or global variable. This patch changes this so that the register macros above now only reference the address offset, and that the FDC's address is explicitly passed in each call to fd_inb() and fd_outb(), thus removing the macro. With this change there is no more implicit usage of the local/global "fdc" variable. One place in the ARM code used to check if the port was equal to FD_DOR, this was changed to testing the register by applying a mask to the port, as was already done in the sparc code. There are still occurrences of fd_inb() and fd_outb() in the PARISC code and these ones remain unaffected since they already used to work with a base address and a register offset. The sparc, m68k and parisc code could now be slightly cleaned up to benefit from the macro definitions above instead of the equivalent hard-coded values. Link: https://lore.kernel.org/r/20200301195555.11154-6-w@1wt.eu Cc: Ian Molton <spyro@f2s.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:58 -06:00
Willy Tarreau	ac7018614d	floppy: introduce new functions fdc_inb() and fdc_outb() These two functions replace fd_inb() and fd_outb() in that they take the FDC in argument. This will ease the separation of the base address and the port everywhere the code is used. Link: https://lore.kernel.org/r/20200301195555.11154-5-w@1wt.eu Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:58 -06:00
Willy Tarreau	8fb3845023	floppy: cleanup: expand the reply_buffer macros Several macros were used to access reply_buffer[] at discrete positions without making it obvious they were relying on this. These ones have been replaced by their offset in the reply buffer to make these accesses more obvious. Link: https://lore.kernel.org/r/20200224212352.8640-11-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:58 -06:00
Willy Tarreau	76dabe7960	floppy: cleanup: expand the R/W / format command macros Various macros were used to access raw_cmd for R/W or format commands without making it obvious that raw_cmd->cmd[] was used. Let's expand the macros to make this more obvious. Link: https://lore.kernel.org/r/20200224212352.8640-10-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	2a34875279	floppy: cleanup: expand macro DRWE This macro doesn't bring much value and only slightly obfuscates the code by silently using global variable "current_drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-9-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	3bd7f87c68	floppy: cleanup: expand macro DRS This macro doesn't bring much value and only slightly obfuscates the code by silently using global variable "current_drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-8-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	031faabd80	floppy: cleanup: expand macro DP This macro doesn't bring much value and only slightly obfuscates the code by silently using global variable "current_drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-7-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	121e297955	floppy: cleanup: expand macro UDRWE This macro doesn't bring much value and only slightly obfuscates the code by silently using local variable "drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-6-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	8d9d34e25a	floppy: cleanup: expand macro UDRS This macro doesn't bring much value and only slightly obfuscates the code by silently using local variable "drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-5-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	1ce9ae9654	floppy: cleanup: expand macro UDP This macro doesn't bring much value and only slightly obfuscates the code by silently using local variable "drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-4-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	f9d322bdb1	floppy: cleanup: expand macro UFDCS This macro doesn't bring much value and only slightly obfuscates the code by silently using local variable "drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-3-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Willy Tarreau	de6048b843	floppy: cleanup: expand macro FDCS Macro FDCS silently uses identifier "fdc" which may be either the global one or a local one. Let's expand the macro to make this more obvious. Link: https://lore.kernel.org/r/20200224212352.8640-2-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-16 08:26:57 -06:00
Dongli Zhang	290df92a94	null_blk: describe the usage of fault injection param As null_blk is a very good start point to test block layer, this patch adds description and comments to 'timeout', 'requeue' and 'init_hctx' to explain how to use fault injection with null_blk. The nvme has similar with nvme_core.fail_request in the form of comment. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-12 18:54:28 -06:00
Alexey Dobriyan	ff77042296	null_blk: fix spurious IO errors after failed past-wp access Steps to reproduce: BLKRESETZONE zone 0 // force EIO pwrite(fd, buf, 4096, 4096); [issue more IO including zone ioctls] It will start failing randomly including IO to unrelated zones because of ->error "reuse". Trigger can be partition detection as well if test is not run immediately which is even more entertaining. The fix is of course to clear ->error where necessary. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-12 09:10:03 -06:00
Hou Pu	2c272542ba	nbd: requeue command if the soecket is changed In commit `2da22da573` (nbd: fix zero cmd timeout handling v2), it is allowed to reset timer when it fires if tag_set.timeout is set to zero. If the server is shutdown and a new socket is reconfigured, the request should be requeued to be processed by new server instead of waiting for response from the old one. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Hou Pu <houpu@bytedance.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-12 08:01:24 -06:00
Hou Pu	d970958b2d	nbd: enable replace socket if only one connection is configured Nbd server with multiple connections could be upgraded since `560bc4b` (nbd: handle dead connections). But if only one conncection is configured, after we take down nbd server, all inflight IO would finally timeout and return error. We could requeue them like what we do with multiple connections and wait for new socket in submit path. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Hou Pu <houpu@bytedance.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-12 07:58:39 -06:00
Jackie Liu	91dfa2dd81	block/drbd: delete invalid function drbd_md_mark_dirty_ We deleted last_md_mark_dirty long ago, this function no longer needs to exist, delete it, otherwise a compilation error will occur when DEBUG is opened. Fixes: `ac0acb9e39` ("drbd: use drbd_device_post_work() in more place") Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-12 07:49:09 -06:00
Takashi Iwai	0348510490	block: aoe: Use scnprintf() for avoiding potential buffer overflow Since snprintf() returns the would-be-output size instead of the actual output size, the succeeding calls may go beyond the given buffer limit. Fix it by replacing with scnprintf(). Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-12 07:39:04 -06:00
Martijn Coenen	0fbcf57982	loop: Only freeze block queue when needed. __loop_update_dio() can be called as a part of loop_set_fd(), when the block queue is not yet up and running; avoid freezing the block queue in that case, since that is an expensive operation. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Martijn Coenen <maco@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-10 14:10:43 -06:00
Martijn Coenen	7e81f99afd	loop: Only change blocksize when needed. Return early in loop_set_block_size() if the requested block size is identical to the one we already have; this avoids expensive calls to freeze the block queue. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martijn Coenen <maco@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-10 14:10:41 -06:00
Bart Van Assche	596444e757	null_blk: Add support for init_hctx() fault injection This makes it possible to test the error path in blk_mq_realloc_hw_ctxs() and also several error paths in null_blk. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-10 07:09:59 -06:00
Bart Van Assche	9b03b71308	null_blk: Handle null_add_dev() failures properly If null_add_dev() fails then null_del_dev() is called with a NULL argument. Make null_del_dev() handle this scenario correctly. This patch fixes the following KASAN complaint: null-ptr-deref in null_del_dev+0x28/0x280 [null_blk] Read of size 8 at addr 0000000000000000 by task find/1062 Call Trace: dump_stack+0xa5/0xe6 __kasan_report.cold+0x65/0x99 kasan_report+0x16/0x20 __asan_load8+0x58/0x90 null_del_dev+0x28/0x280 [null_blk] nullb_group_drop_item+0x7e/0xa0 [null_blk] client_drop_item+0x53/0x80 [configfs] configfs_rmdir+0x395/0x4e0 [configfs] vfs_rmdir+0xb6/0x220 do_rmdir+0x238/0x2c0 __x64_sys_unlinkat+0x75/0x90 do_syscall_64+0x6f/0x2f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-03-10 07:09:59 -06:00

1 2 3 4 5 ...

6622 Commits