linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-26 12:52:30 +00:00

Author	SHA1	Message	Date
Coly Li	0b13efecf5	bcache: add return value check to bch_cached_dev_run() This patch adds return value check to bch_cached_dev_run(), now if there is error happens inside bch_cached_dev_run(), it can be catched. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-28 07:39:14 -06:00
Alexandru Ardelean	89e0341af0	bcache: use sysfs_match_string() instead of __sysfs_match_string() The arrays (of strings) that are passed to __sysfs_match_string() are static, so use sysfs_match_string() which does an implicit ARRAY_SIZE() over these arrays. Functionally, this doesn't change anything. The change is more cosmetic. It only shrinks the static arrays by 1 byte each. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-28 07:39:14 -06:00
Coly Li	f960facb39	bcache: remove unnecessary prefetch() in bset_search_tree() In function bset_search_tree(), when p >= t->size, t->tree[0] will be prefetched by the following code piece, 974 unsigned int p = n << 4; 975 976 p &= ((int) (p - t->size)) >> 31; 977 978 prefetch(&t->tree[p]); The purpose of the above code is to avoid a branch instruction, but when p >= t->size, prefetch(&t->tree[0]) has no positive performance contribution at all. This patch avoids the unncessary prefetch by only calling prefetch() when p < t->size. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-28 07:39:14 -06:00
Coly Li	08ec1e6282	bcache: add io error counting in write_bdev_super_endio() When backing device super block is written by bch_write_bdev_super(), the bio complete callback write_bdev_super_endio() simply ignores I/O status. Indeed such write request also contribute to backing device health status if the request failed. This patch checkes bio->bi_status in write_bdev_super_endio(), if there is error, bch_count_backing_io_errors() will be called to count an I/O error to dc->io_errors. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-28 07:39:14 -06:00
Coly Li	578df99b1b	bcache: ignore read-ahead request failure on backing device When md raid device (e.g. raid456) is used as backing device, read-ahead requests on a degrading and recovering md raid device might be failured immediately by md raid code, but indeed this md raid array can still be read or write for normal I/O requests. Therefore such failed read-ahead request are not real hardware failure. Further more, after degrading and recovering accomplished, read-ahead requests will be handled by md raid array again. For such condition, I/O failures of read-ahead requests don't indicate real health status (because normal I/O still be served), they should not be counted into I/O error counter dc->io_errors. Since there is no simple way to detect whether the backing divice is a md raid device, this patch simply ignores I/O failures for read-ahead bios on backing device, to avoid bogus backing device failure on a degrading md raid array. Suggested-and-tested-by: Thorsten Knabe <linux@thorsten-knabe.de> Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-28 07:39:14 -06:00
Coly Li	e6dcbd3e6c	bcache: avoid flushing btree node in cache_set_flush() if io disabled When cache_set_flush() is called for too many I/O errors detected on cache device and the cache set is retiring, inside the function it doesn't make sense to flushing cached btree nodes from c->btree_cache because CACHE_SET_IO_DISABLE is set on c->flags already and all I/Os onto cache device will be rejected. This patch checks in cache_set_flush() that whether CACHE_SET_IO_DISABLE is set. If yes, then avoids to flush the cached btree nodes to reduce more time and make cache set retiring more faster. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-28 07:39:14 -06:00
Coly Li	695277f16b	Revert "bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()" This reverts commit `6147305c73`. Although this patch helps the failed bcache device to stop faster when too many I/O errors detected on corresponding cached device, setting CACHE_SET_IO_DISABLE bit to cache set c->flags was not a good idea. This operation will disable all I/Os on cache set, which means other attached bcache devices won't work neither. Without this patch, the failed bcache device can also be stopped eventually if internal I/O accomplished (e.g. writeback). Therefore here I revert it. Fixes: `6147305c73` ("bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()") Reported-by: Yong Li <mr.liyong@qq.com> Signed-off-by: Coly Li <colyli@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-28 07:39:14 -06:00
Coly Li	0ae49cb7aa	bcache: fix return value error in bch_journal_read() When everything is OK in bch_journal_read(), finally the return value is returned by, return ret; which assumes ret will be 0 here. This assumption is wrong when all journal buckets as are full and filled with valid journal entries. In such cache the last location referencess read_bucket() sets 'ret' to 1, which means new jset added into jset list. The jset list is list 'journal' in caller run_cache_set(). Return 1 to run_cache_set() means something wrong and the cache set won't start, but indeed everything is OK. This patch changes the line at end of bch_journal_read() to directly return 0 since everything if verything is good. Then a bogus error is fixed. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-28 07:39:14 -06:00
Coly Li	b387e9b586	bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush() When system memory is in heavy pressure, bch_gc_thread_start() from run_cache_set() may fail due to out of memory. In such condition, c->gc_thread is assigned to -ENOMEM, not NULL pointer. Then in following failure code path bch_cache_set_error(), when cache_set_flush() gets called, the code piece to stop c->gc_thread is broken, if (!IS_ERR_OR_NULL(c->gc_thread)) kthread_stop(c->gc_thread); And KASAN catches such NULL pointer deference problem, with the warning information: [ 561.207881] ================================================================== [ 561.207900] BUG: KASAN: null-ptr-deref in kthread_stop+0x3b/0x440 [ 561.207904] Write of size 4 at addr 000000000000001c by task kworker/15:1/313 [ 561.207913] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G W 5.0.0-vanilla+ #3 [ 561.207916] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019 [ 561.207935] Workqueue: events cache_set_flush [bcache] [ 561.207940] Call Trace: [ 561.207948] dump_stack+0x9a/0xeb [ 561.207955] ? kthread_stop+0x3b/0x440 [ 561.207960] ? kthread_stop+0x3b/0x440 [ 561.207965] kasan_report+0x176/0x192 [ 561.207973] ? kthread_stop+0x3b/0x440 [ 561.207981] kthread_stop+0x3b/0x440 [ 561.207995] cache_set_flush+0xd4/0x6d0 [bcache] [ 561.208008] process_one_work+0x856/0x1620 [ 561.208015] ? find_held_lock+0x39/0x1d0 [ 561.208028] ? drain_workqueue+0x380/0x380 [ 561.208048] worker_thread+0x87/0xb80 [ 561.208058] ? __kthread_parkme+0xb6/0x180 [ 561.208067] ? process_one_work+0x1620/0x1620 [ 561.208072] kthread+0x326/0x3e0 [ 561.208079] ? kthread_create_worker_on_cpu+0xc0/0xc0 [ 561.208090] ret_from_fork+0x3a/0x50 [ 561.208110] ================================================================== [ 561.208113] Disabling lock debugging due to kernel taint [ 561.208115] irq event stamp: 11800231 [ 561.208126] hardirqs last enabled at (11800231): [<ffffffff83008538>] do_syscall_64+0x18/0x410 [ 561.208127] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c [ 561.208129] #PF error: [WRITE] [ 561.312253] hardirqs last disabled at (11800230): [<ffffffff830052ff>] trace_hardirqs_off_thunk+0x1a/0x1c [ 561.312259] softirqs last enabled at (11799832): [<ffffffff850005c7>] __do_softirq+0x5c7/0x8c3 [ 561.405975] PGD 0 P4D 0 [ 561.442494] softirqs last disabled at (11799821): [<ffffffff831add2c>] irq_exit+0x1ac/0x1e0 [ 561.791359] Oops: 0002 [#1] SMP KASAN NOPTI [ 561.791362] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G B W 5.0.0-vanilla+ #3 [ 561.791363] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019 [ 561.791371] Workqueue: events cache_set_flush [bcache] [ 561.791374] RIP: 0010:kthread_stop+0x3b/0x440 [ 561.791376] Code: 00 00 65 8b 05 26 d5 e0 7c 89 c0 48 0f a3 05 ec aa df 02 0f 82 dc 02 00 00 4c 8d 63 20 be 04 00 00 00 4c 89 e7 e8 65 c5 53 00 <f0> ff 43 20 48 8d 7b 24 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 [ 561.791377] RSP: 0018:ffff88872fc8fd10 EFLAGS: 00010286 [ 561.838895] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 561.838916] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 561.838934] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 561.838948] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 561.838966] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 561.838979] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 561.838996] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 563.067028] RAX: 0000000000000000 RBX: fffffffffffffffc RCX: ffffffff832dd314 [ 563.067030] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000297 [ 563.067032] RBP: ffff88872fc8fe88 R08: fffffbfff0b8213d R09: fffffbfff0b8213d [ 563.067034] R10: 0000000000000001 R11: fffffbfff0b8213c R12: 000000000000001c [ 563.408618] R13: ffff88dc61cc0f68 R14: ffff888102b94900 R15: ffff88dc61cc0f68 [ 563.408620] FS: 0000000000000000(0000) GS:ffff888f7dc00000(0000) knlGS:0000000000000000 [ 563.408622] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 563.408623] CR2: 000000000000001c CR3: 0000000f48a1a004 CR4: 00000000007606e0 [ 563.408625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 563.408627] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 563.904795] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 563.915796] PKRU: 55555554 [ 563.915797] Call Trace: [ 563.915807] cache_set_flush+0xd4/0x6d0 [bcache] [ 563.915812] process_one_work+0x856/0x1620 [ 564.001226] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 564.033563] ? find_held_lock+0x39/0x1d0 [ 564.033567] ? drain_workqueue+0x380/0x380 [ 564.033574] worker_thread+0x87/0xb80 [ 564.062823] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 564.118042] ? __kthread_parkme+0xb6/0x180 [ 564.118046] ? process_one_work+0x1620/0x1620 [ 564.118048] kthread+0x326/0x3e0 [ 564.118050] ? kthread_create_worker_on_cpu+0xc0/0xc0 [ 564.167066] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 564.252441] ret_from_fork+0x3a/0x50 [ 564.252447] Modules linked in: msr rpcrdma sunrpc rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib i40iw configfs iw_cm ib_cm libiscsi scsi_transport_iscsi mlx4_ib ib_uverbs mlx4_en ib_core nls_iso8859_1 nls_cp437 vfat fat intel_rapl skx_edac x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ses raid0 aesni_intel cdc_ether enclosure usbnet ipmi_ssif joydev aes_x86_64 i40e scsi_transport_sas mii bcache md_mod crypto_simd mei_me ioatdma crc64 ptp cryptd pcspkr i2c_i801 mlx4_core glue_helper pps_core mei lpc_ich dca wmi ipmi_si ipmi_devintf nd_pmem dax_pmem nd_btt ipmi_msghandler device_dax pcc_cpufreq button hid_generic usbhid mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect xhci_pci sysimgblt fb_sys_fops xhci_hcd ttm megaraid_sas drm usbcore nfit libnvdimm sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs [ 564.299390] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree. [ 564.348360] CR2: 000000000000001c [ 564.348362] ---[ end trace b7f0e5cc7b2103b0 ]--- Therefore, it is not enough to only check whether c->gc_thread is NULL, we should use IS_ERR_OR_NULL() to check both NULL pointer and error value. This patch changes the above buggy code piece in this way, if (!IS_ERR_OR_NULL(c->gc_thread)) kthread_stop(c->gc_thread); Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-28 07:39:13 -06:00
Coly Li	141df8bb5d	bcache: don't set max writeback rate if gc is running When gc is running, user space I/O processes may wait inside bcache code, so no new I/O coming. Indeed this is not a real idle time, maximum writeback rate should not be set in such situation. Otherwise a faster writeback thread may compete locks with gc thread and makes garbage collection slower, which results a longer I/O freeze period. This patch checks c->gc_mark_valid in set_at_max_writeback_rate(). If c->gc_mark_valid is 0 (gc running), set_at_max_writeback_rate() returns false, then update_writeback_rate() will not set writeback rate to maximum value even c->idle_counter reaches an idle threshold. Now writeback thread won't interfere gc thread performance. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-28 07:39:13 -06:00
Damien Le Moal	a5b47a40be	block: Remove unused code bio_flush_dcache_pages() is unused. Remove it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-27 07:34:25 -06:00
Douglas Anderson	2b50f230f7	block, bfq: Init saved_wr_start_at_switch_to_srt in unlikely case Some debug code suggested by Paolo was tripping when I did reboot stress tests. Specifically in bfq_bfqq_resume_state() "bic->saved_wr_start_at_switch_to_srt" was later than the current value of "jiffies". A bit of debugging showed that "bic->saved_wr_start_at_switch_to_srt" was actually 0 and a bit more debugging showed that was because we had run through the "unlikely" case in the bfq_bfqq_save_state() function. Let's init "saved_wr_start_at_switch_to_srt" in the unlikely case to something sane. NOTE: this fixes no known real-world errors. Reviewed-by: Paolo Valente <paolo.valente@linaro.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-26 16:25:35 -06:00
Jens Axboe	2ad7a0cc8f	Merge branch 'md-next' of https://github.com/liu-song-6/linux into for-5.3/block Pull single MD warning fix from Song. * 'md-next' of https://github.com/liu-song-6/linux: md/raid1: Fix a warning message in remove_wb()	2019-06-26 13:49:01 -06:00
Dan Carpenter	16d4b74654	md/raid1: Fix a warning message in remove_wb() The WARN_ON() macro doesn't take an error message, it just takes a condition. I've changed this to use WARN(1, "...") instead. Fixes: `3e148a3209` ("md/raid1: fix potential data inconsistency issue with write behind device") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2019-06-26 12:18:48 -07:00
Paolo Valente	3726112ec7	block, bfq: re-schedule empty queues if they deserve I/O plugging Consider, on one side, a bfq_queue Q that remains empty while in service, and, on the other side, the pending I/O of bfq_queues that, according to their timestamps, have to be served after Q. If an uncontrolled amount of I/O from the latter bfq_queues were dispatched while Q is waiting for its new I/O to arrive, then Q's bandwidth guarantees would be violated. To prevent this, I/O dispatch is plugged until Q receives new I/O (except for a properly controlled amount of injected I/O). Unfortunately, preemption breaks I/O-dispatch plugging, for the following reason. Preemption is performed in two steps. First, Q is expired and re-scheduled. Second, the new bfq_queue to serve is chosen. The first step is needed by the second, as the second can be performed only after Q's timestamps have been properly updated (done in the expiration step), and Q has been re-queued for service. This dependency is a consequence of the way how BFQ's scheduling algorithm is currently implemented. But Q is not re-scheduled at all in the first step, because Q is empty. As a consequence, an uncontrolled amount of I/O may be dispatched until Q becomes non empty again. This breaks Q's service guarantees. This commit addresses this issue by re-scheduling Q even if it is empty. This in turn breaks the assumption that all scheduled queues are non empty. Then a few extra checks are now needed. Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-25 09:07:35 -06:00
Paolo Valente	96a291c38c	block, bfq: preempt lower-weight or lower-priority queues BFQ enqueues the I/O coming from each process into a separate bfq_queue, and serves bfq_queues one at a time. Each bfq_queue may be served for at most timeout_sync milliseconds (default: 125 ms). This service scheme is prone to the following inaccuracy. While a bfq_queue Q1 is in service, some empty bfq_queue Q2 may receive I/O, and, according to BFQ's scheduling policy, may become the right bfq_queue to serve, in place of the currently in-service bfq_queue. In this respect, postponing the service of Q2 to after the service of Q1 finishes may delay the completion of Q2's I/O, compared with an ideal service in which all non-empty bfq_queues are served in parallel, and every non-empty bfq_queue is served at a rate proportional to the bfq_queue's weight. This additional delay is equal at most to the time Q1 may unjustly remain in service before switching to Q2. If Q1 and Q2 have the same weight, then this time is most likely negligible compared with the completion time to be guaranteed to Q2's I/O. In addition, first, one of the reasons why BFQ may want to serve Q1 for a while is that this boosts throughput and, second, serving Q1 longer reduces BFQ's overhead. As a conclusion, it is usually better not to preempt Q1 if both Q1 and Q2 have the same weight. In contrast, as Q2's weight or priority becomes higher and higher compared with that of Q1, the above delay becomes larger and larger, compared with the I/O completion times that have to be guaranteed to Q2 according to Q2's weight. So reducing this delay may be more important than avoiding the costs of preempting Q1. Accordingly, this commit preempts Q1 if Q2 has a higher weight or a higher priority than Q1. Preemption causes Q1 to be re-scheduled, and triggers a new choice of the next bfq_queue to serve. If Q2 really is the next bfq_queue to serve, then Q2 will be set in service immediately. This change reduces the component of the I/O latency caused by the above delay by about 80%. For example, on an (old) PLEXTOR PX-256M5 SSD, the maximum latency reported by fio drops from 15.1 to 3.2 ms for a process doing sporadic random reads while another process is doing continuous sequential reads. Signed-off-by: Nicola Bottura <bottura.nicola95@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-25 09:07:35 -06:00
Paolo Valente	13a857a4c4	block, bfq: detect wakers and unconditionally inject their I/O A bfq_queue Q may happen to be synchronized with another bfq_queue Q2, i.e., the I/O of Q2 may need to be completed for Q to receive new I/O. We call Q2 "waker queue". If I/O plugging is being performed for Q, and Q is not receiving any more I/O because of the above synchronization, then, thanks to BFQ's injection mechanism, the waker queue is likely to get served before the I/O-plugging timeout fires. Unfortunately, this fact may not be sufficient to guarantee a high throughput during the I/O plugging, because the inject limit for Q may be too low to guarantee a lot of injected I/O. In addition, the duration of the plugging, i.e., the time before Q finally receives new I/O, may not be minimized, because the waker queue may happen to be served only after other queues. To address these issues, this commit introduces the explicit detection of the waker queue, and the unconditional injection of a pending I/O request of the waker queue on each invocation of bfq_dispatch_request(). One may be concerned that this systematic injection of I/O from the waker queue delays the service of Q's I/O. Fortunately, it doesn't. On the contrary, next Q's I/O is brought forward dramatically, for it is not blocked for milliseconds. Reported-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Tested-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-25 09:07:34 -06:00
Paolo Valente	a3f9bce369	block, bfq: bring forward seek&think time update Until the base value for request service times gets finally computed for a bfq_queue, the inject limit for that queue does depend on the think-time state (short\|long) of the queue. A timely update of the think time then guarantees a quicker activation or deactivation of the injection. Fortunately, the think time of a bfq_queue is updated in the same code path as the inject limit; but after the inject limit. This commits moves the update of the think time before the update of the inject limit. For coherence, it moves the update of the seek time too. Reported-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Tested-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-25 09:07:34 -06:00
Paolo Valente	24792ad01c	block, bfq: update base request service times when possible I/O injection gets reduced if it increases the request service times of the victim queue beyond a certain threshold. The threshold, in its turn, is computed as a function of the base service time enjoyed by the queue when it undergoes no injection. As a consequence, for injection to work properly, the above base value has to be accurate. In this respect, such a value may vary over time. For example, it varies if the size or the spatial locality of the I/O requests in the queue change. It is then important to update this value whenever possible. This commit performs this update. Reported-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Tested-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-25 09:07:34 -06:00
Paolo Valente	db599f9ed9	block, bfq: fix rq_in_driver check in bfq_update_inject_limit One of the cases where the parameters for injection may be updated is when there are no more in-flight I/O requests. The number of in-flight requests is stored in the field bfqd->rq_in_driver of the descriptor bfqd of the device. So, the controlled condition is bfqd->rq_in_driver == 0. Unfortunately, this is wrong because, the instruction that checks this condition is in the code path that handles the completion of a request, and, in particular, the instruction is executed before bfqd->rq_in_driver is decremented in such a code path. This commit fixes this issue by just replacing 0 with 1 in the comparison. Reported-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Tested-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-25 09:07:34 -06:00
Paolo Valente	766d61412e	block, bfq: reset inject limit when think-time state changes Until the base value of the request service times gets finally computed for a bfq_queue, the inject limit does depend on the think-time state (short\|long). The limit must be 0 or 1 if the think time is deemed, respectively, as short or long. However, such a check and possible limit update is performed only periodically, once per second. So, to make the injection mechanism much more reactive, this commit performs the update also every time the think-time state changes. In addition, in the following special case, this commit lets the inject limit of a bfq_queue bfqq remain equal to 1 even if bfqq's think time is short: bfqq's I/O is synchronized with that of some other queue, i.e., bfqq may receive new I/O only after the I/O of the other queue is completed. Keeping the inject limit to 1 allows the blocking I/O to be served while bfqq is in service. And this is very convenient both for bfqq and for the total throughput, as explained in detail in the comments in bfq_update_has_short_ttime(). Reported-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Tested-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-25 09:07:34 -06:00
Jens Axboe	6b2c8e522c	Merge branch 'nvme-5.3' of git://git.infradead.org/nvme into for-5.3/block Pull NVMe updates from Christoph: "A large chunk of NVMe updates for 5.3. Highlights: - improved PCIe suspent support (Keith Busch) - error injection support for the admin queue (Akinobu Mita) - Fibre Channel discovery improvements (James Smart) - tracing improvements including nvmetc tracing support (Minwoo Im) - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya Kulkarni)" * 'nvme-5.3' of git://git.infradead.org/nvme: (26 commits) Documentation: nvme: add an example for nvme fault injection nvme: enable to inject errors into admin commands nvme: prepare for fault injection into admin commands nvmet: introduce target-side trace nvme-trace: print result and status in hex format nvme-trace: support for fabrics commands in host-side nvme-trace: move opcode symbol print to nvme.h nvme-trace: do not export nvme_trace_disk_name nvme-pci: clean up nvme_remove_dead_ctrl a bit nvme-pci: properly report state change failure in nvme_reset_work nvme-pci: set the errno on ctrl state change error nvme-pci: adjust irq max_vector using num_possible_cpus() nvme-pci: remove queue_count_ops for write_queues and poll_queues nvme-pci: remove unnecessary zero for static var nvme-pci: use host managed power state for suspend nvme: introduce nvme_is_fabrics to check fabrics cmd nvme: export get and set features nvme: fix possible io failures when removing multipathed ns nvme-fc: add message when creating new association lpfc: add sysfs interface to post NVME RSCN ...	2019-06-24 10:10:35 -06:00
Akinobu Mita	7e31d8215f	Documentation: nvme: add an example for nvme fault injection This adds an example of how to inject errors into admin commands. Suggested-by: Thomas Tai <thomas.tai@oracle.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:15:50 +02:00
Akinobu Mita	f79d5fda4e	nvme: enable to inject errors into admin commands This enables to inject errors into the commands submitted to the admin queue. It is useful to test error handling in the controller initialization. # echo 100 > /sys/kernel/debug/nvme0/fault_inject/probability # echo 1 > /sys/kernel/debug/nvme0/fault_inject/times # echo 10 > /sys/kernel/debug/nvme0/fault_inject/space # nvme reset /dev/nvme0 # dmesg ... nvme nvme0: Could not set queue count (16385) nvme nvme0: IO queues not created Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:15:50 +02:00
Akinobu Mita	a3646451ed	nvme: prepare for fault injection into admin commands Currenlty fault injection support for nvme only enables to inject errors into the commands submitted to I/O queues. In preparation for fault injection into the admin commands, this makes the helper functions independent of struct nvme_ns. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:15:50 +02:00
Minwoo Im	a5448fdc46	nvmet: introduce target-side trace This patch introduces target-side request tracing. As Christoph suggested, the trace would not be in a core or module to avoid disadvantages like cache miss: http://lists.infradead.org/pipermail/linux-nvme/2019-June/024721.html The target-side trace code is entirely based on the Johannes's trace code from the host side. It has lots of codes duplicated, but it would be better than having advantages mentioned above. It also traces not only fabrics commands, but also nvme normal commands. Once the codes to be shared gets bigger, then we can make it common as suggsted. This also removed the create_sq and create_cq trace parsing functions because it will be done by the connect fabrics command. Example: echo 1 > /sys/kernel/debug/tracing/event/nvmet/nvmet_req_init/enable echo 1 > /sys/kernel/debug/tracing/event/nvmet/nvmet_req_complete/enable cat /sys/kernel/debug/tracing/trace Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> [hch: fixed the symbol namespace and a an endianess conversion] Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:15:46 +02:00
Geert Uytterhoeven	2f5af4ab7d	lightnvm: fix uninitialized pointer in nvm_remove_tgt() With gcc 4.1: drivers/lightnvm/core.c: In function ‘nvm_remove_tgt’: drivers/lightnvm/core.c:510: warning: ‘t’ is used uninitialized in this function Indeed, if no NVM devices have been registered, t will be an uninitialized pointer, and may be dereferenced later. A call to nvm_remove_tgt() can be triggered from userspace by issuing the NVM_DEV_REMOVE ioctl on the lightnvm control device. Fix this by preinitializing t to NULL. Fixes: `843f2edbdd` ("lightnvm: do not remove instance under global lock") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-21 03:14:30 -06:00
Heiner Litz	510fd8ea98	lightnvm: pblk: fix freeing of merged pages bio_add_pc_page() may merge pages when a bio is padded due to a flush. Fix iteration over the bio to free the correct pages in case of a merge. Signed-off-by: Heiner Litz <hlitz@ucsc.edu> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-06-21 03:14:29 -06:00
Minwoo Im	5f965f4fd9	nvme-trace: print result and status in hex format The "result" field is in 64bit to be printed out which means it could be like: nvme_complete_rq: nvme0: qid=0, cmdid=0, res=18446612684158962624, etries=0, flags=0x0, status=0 Switch both the result and status field to be printed in hexadecimal format to be easier to read. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:12:37 +02:00
Minwoo Im	ad795e47cd	nvme-trace: support for fabrics commands in host-side This patch introduces fabrics commands tracing feature from host-side. This patch does not include any changes for the previous host-side tracing, but just add fabrics commands parsing in cmd=() format. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> [hch: fixed some whitespace damage] Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:12:22 +02:00
Minwoo Im	26f2990d85	nvme-trace: move opcode symbol print to nvme.h The following patches are going to provide the target-side trace which might need these kind of macros. It would be great if it can be shared between host and target side both. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:12:19 +02:00
Minwoo Im	7183a46a48	nvme-trace: do not export nvme_trace_disk_name nvme_trace_disk_name() is now already being invoked with the function prototype in trace.h. We don't need to export this symbol at all. The following patches are going to provide target-side trace feature with the exactly same function with this so that this patch removes the EXPORT_SYMBOL() for this function. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:11:56 +02:00
Chaitanya Kulkarni	7c1ce408eb	nvme-pci: clean up nvme_remove_dead_ctrl a bit Remove the status parameter o nvme_remove_dead_ctrl(), which is only used for printing it. We move the print message to the same function where actual error is occurring. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:39 +02:00
Minwoo Im	cee6c269b0	nvme-pci: properly report state change failure in nvme_reset_work If the state change to NVME_CTRL_CONNECTING fails, the dmesg is going to be like: [ 293.689160] nvme nvme0: failed to mark controller CONNECTING [ 293.689160] nvme nvme0: Removing after probe failure status: 0 Even it prints the first line to indicate the situation, the second line is not proper because the status is 0 which means normally success of the previous operation. This patch makes it indicate the proper error value when it fails. [ 25.932367] nvme nvme0: failed to mark controller CONNECTING [ 25.932369] nvme nvme0: Removing after probe failure status: -16 This situation is able to be easily reproduced by: root@target:~# rmmod nvme && modprobe nvme && rmmod nvme Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:39 +02:00
Chaitanya Kulkarni	e71afda493	nvme-pci: set the errno on ctrl state change error This patch removes the confusing assignment of the variable result at the time of declaration and sets the value in error cases next to the places where the actual error is happening. Here we also set the result value to -ENODEV when we fail at the final ctrl state transition in nvme_reset_work(). Without this assignment result will hold 0 from nvme_setup_io_queue() and on failure 0 will be passed to he nvme_remove_dead_ctrl() from final state transition. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
Minwoo Im	dad77d6390	nvme-pci: adjust irq max_vector using num_possible_cpus() If the "irq_queues" are greater than num_possible_cpus(), nvme_calc_irq_sets() can have irq set_size for HCTX_TYPE_DEFAULT greater than it can be afforded. 2039 affd->set_size[HCTX_TYPE_DEFAULT] = nrirqs - nr_read_queues; It might cause a WARN() from the irq_build_affinity_masks() like [1]: 220 if (nr_present < numvecs) 221 WARN_ON(nr_present + nr_others < numvecs); This patch prevents it from the WARN() by adjusting the max_vector value from the nvme_setup_irqs(). [1] WARN messages when modprobe nvme write_queues=32 poll_queues=0: root@target:~/nvme# nproc 8 root@target:~/nvme# modprobe nvme write_queues=32 poll_queues=0 [ 17.925326] nvme nvme0: pci function 0000:00:04.0 [ 17.940601] WARNING: CPU: 3 PID: 1030 at kernel/irq/affinity.c:221 irq_create_affinity_masks+0x222/0x330 [ 17.940602] Modules linked in: nvme nvme_core [last unloaded: nvme] [ 17.940605] CPU: 3 PID: 1030 Comm: kworker/u17:4 Tainted: G W 5.1.0+ #156 [ 17.940605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 17.940608] Workqueue: nvme-reset-wq nvme_reset_work [nvme] [ 17.940609] RIP: 0010:irq_create_affinity_masks+0x222/0x330 [ 17.940611] Code: 4c 8d 4c 24 28 4c 8d 44 24 30 e8 c9 fa ff ff 89 44 24 18 e8 c0 38 fa ff 8b 44 24 18 44 8b 54 24 1c 5a 44 01 d0 41 39 c4 76 02 <0f> 0b 48 89 df 44 01 e5 e8 f1 ce 10 00 48 8b 34 24 44 89 f0 44 01 [ 17.940611] RSP: 0018:ffffc90002277c50 EFLAGS: 00010216 [ 17.940612] RAX: 0000000000000008 RBX: ffff88807ca48860 RCX: 0000000000000000 [ 17.940612] RDX: ffff88807bc03800 RSI: 0000000000000020 RDI: 0000000000000000 [ 17.940613] RBP: 0000000000000001 R08: ffffc90002277c78 R09: ffffc90002277c70 [ 17.940613] R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000020 [ 17.940614] R13: 0000000000025d08 R14: 0000000000000001 R15: ffff88807bc03800 [ 17.940614] FS: 0000000000000000(0000) GS:ffff88807db80000(0000) knlGS:0000000000000000 [ 17.940616] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 17.940617] CR2: 00005635e583f790 CR3: 000000000240a000 CR4: 00000000000006e0 [ 17.940617] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 17.940618] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 17.940618] Call Trace: [ 17.940622] __pci_enable_msix_range+0x215/0x540 [ 17.940623] ? kernfs_put+0x117/0x160 [ 17.940625] pci_alloc_irq_vectors_affinity+0x74/0x110 [ 17.940626] nvme_reset_work+0xc30/0x1397 [nvme] [ 17.940628] ? __switch_to_asm+0x34/0x70 [ 17.940628] ? __switch_to_asm+0x40/0x70 [ 17.940629] ? __switch_to_asm+0x34/0x70 [ 17.940630] ? __switch_to_asm+0x40/0x70 [ 17.940630] ? __switch_to_asm+0x34/0x70 [ 17.940631] ? __switch_to_asm+0x40/0x70 [ 17.940632] ? nvme_irq_check+0x30/0x30 [nvme] [ 17.940633] process_one_work+0x20b/0x3e0 [ 17.940634] worker_thread+0x1f9/0x3d0 [ 17.940635] ? cancel_delayed_work+0xa0/0xa0 [ 17.940636] kthread+0x117/0x120 [ 17.940637] ? kthread_stop+0xf0/0xf0 [ 17.940638] ret_from_fork+0x3a/0x50 [ 17.940639] ---[ end trace aca8a131361cd42a ]--- [ 17.942124] nvme nvme0: 7/1/0 default/read/poll queues Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
Minwoo Im	483178f38c	nvme-pci: remove queue_count_ops for write_queues and poll_queues queue_count_set() seems like that it has been provided to limit the number of queue entries for write/poll queues. But, the queue_count_set() has been doing nothing but a parameter check even it has num_possible_cpus() which is nop. This patch removes entire queue_count_ops from the write_queues and poll_queues. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
Minwoo Im	a232ea0ebf	nvme-pci: remove unnecessary zero for static var poll_queues will be zero even without zero initialization here. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
Keith Busch	d916b1be94	nvme-pci: use host managed power state for suspend The nvme pci driver prepares its devices for power loss during suspend by shutting down the controllers. The power setting is deferred to pci driver's power management before the platform removes power. The suspend-to-idle mode, however, does not remove power. NVMe devices that implement host managed power settings can achieve lower power and better transition latencies than using generic PCI power settings. Try to use this feature if the platform is not involved with the suspend. If successful, restore the previous power state on resume. Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Mario Limonciello <mario.limonciello@dell.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> [hch: fixed the compilation for the !CONFIG_PM_SLEEP case] Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
Minwoo Im	7a1f46e3f7	nvme: introduce nvme_is_fabrics to check fabrics cmd This patch introduces a nvme_is_fabrics() inline function to check whether or not the given command structure is for fabrics. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
Keith Busch	1a87ee657c	nvme: export get and set features Future use intends to make use of both, so export these functions. And since their implementation is identical except for the opcode, provide a new function that implement both. [akinobu.mita@gmail.com>: fix line over 80 characters] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
Anton Eidelman	2181e45561	nvme: fix possible io failures when removing multipathed ns When a shared namespace is removed, we call blk_cleanup_queue() when the device can still be accessed as the current path and this can result in submission to a dying queue. Hence, direct_make_request() called by our mpath device may fail (propagating the failure to userspace). Instead, we want to failover this I/O to a different path if one exists. Thus, before we cleanup the request queue, we make sure that the device is cleared from the current path nor it can be selected again as such. Fix this by: - clear the ns from the head->list and synchronize rcu to make sure there is no concurrent path search that restores it as the current path - clear the mpath current path in order to trigger a subsequent path search and sync srcu to wait for any ongoing request submissions - safely continue to namespace removal and blk_cleanup_queue Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
James Smart	4bea364f16	nvme-fc: add message when creating new association When looking at console messages to troubleshoot, there are one maybe two messages before creation of the controller is complete. However, a lot of io takes place to reach that point. It's unclear when things have started. Add a message when the controller is attempting to create a new association. Thus we know what controller, between what host and remote port, and what NQN is being put into place for any subsequent success or failure messages. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Giridhar Malavali <gmalavali@marvell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
James Smart	41b194b843	lpfc: add sysfs interface to post NVME RSCN To support scenarios which aren't bound to nvmetcli add port scenarios, which is currently where the nvmet_fc transport invokes the discovery event callbacks, a syfs attribute is added to lpfc which can be written to cause an RSCN to be generated for the nport. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
James Smart	6f2589f478	lpfc: add support for translating an RSCN rcv into a discovery rescan This patch updates RSCN receive processing to check for the remote port being an NVME port, and if so, invoke the nvme_fc callback to rescan the remote port. The rescan will generate a discovery udev event. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
James Smart	ab723121a8	lpfc: add nvmet discovery_event op support This patch adds support for the nvmet discovery op. When the callback routine is called, the driver will call the routine to generate an RSCN to the port on the other end of the link. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:37 +02:00
James Smart	f60cb93bbf	lpfc: add support to generate RSCN events for nport This patch adds general RSCN support: - The ability to transmit an RSCN to the port on the other end of the link (regular port if pt2pt, or fabric controller if fabric). - And general recognition of an RSCN ELS when an ELS is received. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:37 +02:00
James Smart	4cf7c363b4	nvme-fcloop: add support for nvmet discovery_event op Update fcloop to support the discovery_event operation and invoke a nvme rescan. In a real fc adapter, this would generate an RSCN, which the host would receive and convert into a nvme rescan on the remote port specified in the rscn payload. Signed-off-by: James Smart <jsmart2021@gmail.com> [kbuild-bot: fcloop_tgt_discovery_evt can be static] Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:37 +02:00
James Smart	150d71f725	nvmet-fc: add transport discovery change event callback support This patch adds support for the nvmet discovery_change transport op. In turn, the transport adds it's own LLDD api callback discovery_event op to request the LLDD to generate an RSCN for the discovery change. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:37 +02:00
James Smart	9d09dd8d76	nvmet: add transport discovery change op Some transports, such as FC-NVME, support discovery controller change events without the use of a persistent discovery controller. FC receives events via RSCN from the FC Fabric Controller or subsystem FC port. This patch adds a nvmet transport op that is called whenever a discovery change event occurs in the nvmet layer. To facilitate the callback without adding another layer to cross into core.c to reference the transport ops, the port structure snapshots the transport ops when the port is enabled and clears them when disabled. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:37 +02:00

1 2 3 4 5 ...

841142 Commits