linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-26 12:52:30 +00:00

Author	SHA1	Message	Date
Kai-Heng Feng	9947d6a09c	nvme: relax APST default max latency to 100ms Christoph Hellwig suggests we should to make APST work out of the box. Hence relax the the default max latency to make them able to enter deepest power state on default. Here are id-ctrl excerpts from two high latency NVMes: vid : 0x14a4 ssvid : 0x1b4b mn : CX2-GB1024-Q11 NVMe LITEON 1024GB ps 3 : mp:0.1000W non-operational enlat:5000 exlat:5000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0100W non-operational enlat:50000 exlat:100000 rrt:4 rrl:4 rwt:4 rwl:4 idle_power:- active_power:- vid : 0x15b7 ssvid : 0x1b4b mn : A400 NVMe SanDisk 512GB ps 3 : mp:0.0500W non-operational enlat:51000 exlat:10000 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 4 : mp:0.0055W non-operational enlat:1000000 exlat:100000 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-07 11:08:55 +02:00
Kai-Heng Feng	da87591bea	nvme: only consider exit latency when choosing useful non-op power states When a NVMe is in non-op states, the latency is exlat. The latency will be enlat + exlat only when the NVMe tries to transit from operational state right atfer it begins to transit to non-operational state, which should be a rare case. Therefore, as Andy Lutomirski suggests, use exlat only when deciding power states to trainsit to. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-07 11:08:54 +02:00
James Smart	24b7f0592f	nvme-fc: fix missing put reference on controller create failure The failure case, of a create controller request, called nvme_uninit_ctrl() but didn't do a put to allow the nvme controller to be deleted. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-07 11:08:53 +02:00
James Smart	f874d5d079	nvme-fc: on lldd/transport io error, terminate association Per FC-NVME, when lldd or transport detects an i/o error, the connection must be terminated, which in turn requires the association to be termianted. Currently the transport simply creates a nvme completion status of transport error and returns the io. The FC-NVME spec makes the mandate as initiator and host, depending on the error, can get out of sync on outstanding io counts (sqhd/sqtail). Implement the association teardown on lldd or transport detected errors. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2017-06-07 11:08:52 +02:00
Sagi Grimberg	e818a5b487	nvme-rdma: fast fail incoming requests while we reconnect When we encounter an transport/controller errors, error recovery kicks in which performs: 1. stops io/admin queues 2. moves transport queues out of LIVE state 3. fast fail pending io 4. schedule periodic reconnects. But we also need to fast fail incoming IO taht enters after we already scheduled. Given that our queue is not LIVE anymore, simply restart the request queues to fail in .queue_rq Reported-by: Alex Turin <alex@vastdata.com> Reported-by: shahar.salzman <shahar.salzman@gmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org	2017-06-07 11:08:51 +02:00
Rakesh Pandit	82b057caef	nvme-pci: fix multiple ctrl removal scheduling Commit `c5f6ce97c1` tries to address multiple resets but fails as work_busy doesn't involve any synchronization and can fail. This is reproducible easily as can be seen by WARNING below which is triggered with line: WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING) Allowing multiple resets can result in multiple controller removal as well if different conditions inside nvme_reset_work fail and which might deadlock on device_release_driver. [ 480.327007] WARNING: CPU: 3 PID: 150 at drivers/nvme/host/pci.c:1900 nvme_reset_work+0x36c/0xec0 [ 480.327008] Modules linked in: rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast... [ 480.327044] btusb videobuf2_core ghash_clmulni_intel snd_hwdep cfg80211 acer_wmi hci_uart.. [ 480.327065] CPU: 3 PID: 150 Comm: kworker/u16:2 Not tainted 4.12.0-rc1+ #13 [ 480.327065] Hardware name: Acer Predator G9-591/Mustang_SLS, BIOS V1.10 03/03/2016 [ 480.327066] Workqueue: nvme nvme_reset_work [ 480.327067] task: ffff880498ad8000 task.stack: ffffc90002218000 [ 480.327068] RIP: 0010:nvme_reset_work+0x36c/0xec0 [ 480.327069] RSP: 0018:ffffc9000221bdb8 EFLAGS: 00010246 [ 480.327070] RAX: 0000000000460000 RBX: ffff880498a98128 RCX: dead000000000200 [ 480.327070] RDX: 0000000000000001 RSI: ffff8804b1028020 RDI: ffff880498a98128 [ 480.327071] RBP: ffffc9000221be50 R08: 0000000000000000 R09: 0000000000000000 [ 480.327071] R10: ffffc90001963ce8 R11: 000000000000020d R12: ffff880498a98000 [ 480.327072] R13: ffff880498a53500 R14: ffff880498a98130 R15: ffff880498a98128 [ 480.327072] FS: 0000000000000000(0000) GS:ffff8804c1cc0000(0000) knlGS:0000000000000000 [ 480.327073] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 480.327074] CR2: 00007ffcf3c37f78 CR3: 0000000001e09000 CR4: 00000000003406e0 [ 480.327074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 480.327075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 480.327075] Call Trace: [ 480.327079] ? __switch_to+0x227/0x400 [ 480.327081] process_one_work+0x18c/0x3a0 [ 480.327082] worker_thread+0x4e/0x3b0 [ 480.327084] kthread+0x109/0x140 [ 480.327085] ? process_one_work+0x3a0/0x3a0 [ 480.327087] ? kthread_park+0x60/0x60 [ 480.327102] ret_from_fork+0x2c/0x40 [ 480.327103] Code: e8 5a dc ff ff 85 c0 41 89 c1 0f..... This patch addresses the problem by using state of controller to decide whether reset should be queued or not as state change is synchronizated using controller spinlock. Also cancel_work_sync is used to make sure remove cancels the reset_work and waits for it to finish. This patch also changes return value from -ENODEV to more appropriate -EBUSY if nvme_reset fails to change state. Fixes: `c5f6ce97c1` ("nvme: don't schedule multiple resets") Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-07 11:08:50 +02:00
Ming Lei	82654b6b8e	nvme: fix hang in remove path We need to start admin queues too in nvme_kill_queues() for avoiding hang in remove path[1]. This patch is very similar with 806f026f9b901eaf(nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()). [1] hang stack trace [<ffffffff813c9716>] blk_execute_rq+0x56/0x80 [<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0 [<ffffffff815ce7be>] nvme_set_features+0x5e/0x90 [<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200 [<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50 [<ffffffff8157bd11>] apply_constraint+0xb1/0xc0 [<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0 [<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60 [<ffffffff8156d951>] device_del+0x101/0x320 [<ffffffff8156db8a>] device_unregister+0x1a/0x60 [<ffffffff8156dc4c>] device_destroy+0x3c/0x50 [<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0 [<ffffffff815d4858>] nvme_remove+0x78/0x110 [<ffffffff81452b69>] pci_device_remove+0x39/0xb0 [<ffffffff81572935>] device_release_driver_internal+0x155/0x210 [<ffffffff81572a02>] device_release_driver+0x12/0x20 [<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70 [<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0 [<ffffffff810bf61e>] worker_thread+0x4e/0x3b0 [<ffffffff810c5ac9>] kthread+0x109/0x140 [<ffffffff8185800c>] ret_from_fork+0x2c/0x40 [<ffffffffffffffff>] 0xffffffffffffffff Fixes: c5552fde102fc("nvme: Enable autonomous power state transitions") Reported-by: Rakesh Pandit <rakesh@tuxera.com> Tested-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-07 11:08:49 +02:00
Andy Lutomirski	50af47d04c	nvme: Quirk APST on Intel 600P/P3100 devices They have known firmware bugs. A fix is apparently in the works -- once fixed firmware is available, someone from Intel (Hi, Keith!) can adjust the quirk accordingly. Cc: stable@vger.kernel.org # v4.11 Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: Mario Limonciello <mario_limonciello@dell.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-05-26 11:53:02 +03:00
Christoph Hellwig	c81bfba998	nvme: only setup block integrity if supported by the driver Currently only the PCIe driver supports metadata, so we should not claim integrity support for the other drivers. This prevents nasty crashes with targets that advertise metadata support on fabrics. Also use the opportunity to factor out some code into a separate helper that isn't even compiled if CONFIG_BLK_DEV_INTEGRITY is disabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>	2017-05-26 09:54:23 +03:00
Christoph Hellwig	d3d5b87ddd	nvme: replace is_flags field in nvme_ctrl_ops with a flags field So that we can have more flags for transport-specific behavior. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>	2017-05-26 09:54:16 +03:00
Christoph Hellwig	9bdcfb10f2	nvme-pci: consistencly use ctrl->device for logging This is what most of the code already does and gives much more useful prefixes than the device embedded in the pci_dev. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>	2017-05-26 09:54:07 +03:00
James Smart	2cb657bc02	nvme_fc: remove extra controller reference taken on reconnect fix extra controller reference taken on reconnect by moving reference to initial controller create Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-05-22 20:55:29 +02:00
James Smart	e392e1f1f4	nvme_fc: correct nvme status set on abort correct nvme status set on abort. Patch that changed status to being actual nvme status crossed in the night with the patch that added abort values. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-05-22 20:55:29 +02:00
James Smart	589ff7753b	nvme_fc: set logging level on resets/deletes Per the review by Sagi on: http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html Looked at existing warn vs info vs err dev_xxx levels for the messages printed on reconnects and deletes: - Resets due to error and resets transitioned to deletes are dev_warn - Other reset/disconnect messages are dev_info - Removed chatty io queue related messages Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-05-22 20:55:29 +02:00
James Smart	a5321aa5ef	nvme_fc: revise comment on teardown Per the recommendation by Sagi on: http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html An extra reference was pointed out. There's no issue with the references, but rather a literal interpretation of what the comment is saying. Reword the comment to avoid confusion. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-05-22 20:55:29 +02:00
James Smart	5bbecdbc8e	nvme_fc: Support ctrl_loss_tmo Sync with Sagi's recent addition of ctrl_loss_tmo in the core fabrics layer. Remove local connect limits and connect_attempts variable. Use fabrics new nr_connects variable and use of nvmf_should_reconnect() Refactor duplicate reconnect failure code. Addresses review comment by Sagi on controller reset support: http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-05-22 20:55:29 +02:00
James Smart	0ce872bf8b	nvme_fc: get rid of local reconnect_delay Remove the local copy of reconnect_delay. Use the value in the controller options directly. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-05-22 20:55:29 +02:00
Ming Lei	986f75c876	nvme: avoid to use blk_mq_abort_requeue_list() NVMe may add request into requeue list simply and not kick off the requeue if hw queues are stopped. Then blk_mq_abort_requeue_list() is called in both nvme_kill_queues() and nvme_ns_remove() for dealing with this issue. Unfortunately blk_mq_abort_requeue_list() is absolutely a race maker, for example, one request may be requeued during the aborting. So this patch just calls blk_mq_kick_requeue_list() in nvme_kill_queues() to handle this issue like what nvme_start_queues() does. Now all requests in requeue list when queues are stopped will be handled by blk_mq_kick_requeue_list() when queues are restarted, either in nvme_start_queues() or in nvme_kill_queues(). Cc: stable@vger.kernel.org Reported-by: Zhang Yi <yizhan@redhat.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-05-22 20:50:10 +02:00
Ming Lei	806f026f9b	nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() Inside nvme_kill_queues(), we have to start hw queues for draining requests in sw queues, .dispatch list and requeue list, so use blk_mq_start_hw_queues() instead of blk_mq_start_stopped_hw_queues() which only run queues if queues are stopped, but the queues may have been started already, for example nvme_start_queues() is called in reset work function. blk_mq_start_hw_queues() run hw queues in current context, instead of running asynchronously like before. Given nvme_kill_queues() is run from either remove context or reset worker context, both are fine to run hw queue directly. And the mutex of namespaces_mutex isn't a problem too becasue nvme_start_freeze() runs hw queue in this way already. Cc: stable@vger.kernel.org Reported-by: Zhang Yi <yizhan@redhat.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-05-22 20:50:09 +02:00
Marta Rybczynska	0544f5494a	nvme-rdma: support devices with queue size < 32 In the case of small NVMe-oF queue size (<32) we may enter a deadlock caused by the fact that the IB completions aren't sent waiting for 32 and the send queue will fill up. The error is seen as (using mlx5): [ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273): [ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12 This patch changes the way the signaling is done so that it depends on the queue depth now. The magic define has been removed completely. Cc: stable@vger.kernel.org Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu> Signed-off-by: Samuel Jones <sjones@kalray.eu> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-05-22 20:49:49 +02:00
Linus Torvalds	894e21642d	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A small collection of fixes that should go into this cycle. - a pull request from Christoph for NVMe, which ended up being manually applied to avoid pulling in newer bits in master. Mostly fibre channel fixes from James, but also a few fixes from Jon and Vijay - a pull request from Konrad, with just a single fix for xen-blkback from Gustavo. - a fuseblk bdi fix from Jan, fixing a regression in this series with the dynamic backing devices. - a blktrace fix from Shaohua, replacing sscanf() with kstrtoull(). - a request leak fix for drbd from Lars, fixing a regression in the last series with the kref changes. This will go to stable as well" * 'for-linus' of git://git.kernel.dk/linux-block: nvmet: release the sq ref on rdma read errors nvmet-fc: remove target cpu scheduling flag nvme-fc: stop queues on error detection nvme-fc: require target or discovery role for fc-nvme targets nvme-fc: correct port role bits nvme: unmap CMB and remove sysfs file in reset path blktrace: fix integer parse fuseblk: Fix warning in super_setup_bdi_name() block: xen-blkback: add null check to avoid null pointer dereference drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()	2017-05-20 16:12:30 -07:00
James Smart	2952a879ba	nvme-fc: stop queues on error detection Per the recommendation by Sagi on: http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html Rather than waiting for reset work thread to stop queues and abort the ios, immediately stop the queues on error detection. Reset thread will restop the queues (as it's called on other paths), but it does not appear to have a side effect. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-20 10:11:34 -06:00
James Smart	85e6a6adf8	nvme-fc: require target or discovery role for fc-nvme targets In order to create an association, the remoteport must be serving either a target role or a discovery role. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-20 10:11:34 -06:00
Jon Derrick	f63572dff1	nvme: unmap CMB and remove sysfs file in reset path CMB doesn't get unmapped until removal while getting remapped on every reset. Add the unmapping and sysfs file removal to the reset path in nvme_pci_disable to match the mapping path in nvme_pci_enable. Fixes: `202021c1a` ("nvme : Add sysfs entry for NVMe CMBs when appropriate") Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Acked-by: Keith Busch <keith.busch@intel.com> Reviewed-By: Stephen Bates <sbates@raithlin.com> Cc: <stable@vger.kernel.org> # 4.9+ Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-20 10:11:34 -06:00
Linus Torvalds	55a1ab56c7	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A smaller collection of fixes that should go into -rc1. This contains: - A fix from Christoph, fixing a regression with the WRITE_SAME and partial completions. Caused a BUG() on ppc. - Fixup for __blk_mq_stop_hw_queues(), it should be static. From Colin. - Removal of dmesg error messages on elevator switching, when invoked from sysfs. From me. - Fix for blk-stat, using this_cpu_ptr() in a section only protected by rcu_read_lock(). This breaks when PREEMPT_RCU is enabled. From me. - Two fixes for BFQ from Paolo, one fixing a crash and one updating the documentation. - An error handling lightnvm memory leak, from Rakesh. - The previous blk-mq hot unplug lock reversal depends on the CPU hotplug rework that isn't in mainline yet. This caused a lockdep splat when people unplugged CPUs with blk-mq devices. From Wanpeng. - A regression fix for DIF/DIX on blk-mq. From Wen" * 'for-linus' of git://git.kernel.dk/linux-block: block: handle partial completions for special payload requests blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op blk-stat: don't use this_cpu_ptr() in a preemptable section elevator: remove redundant warnings on IO scheduler switch block, bfq: stress that low_latency must be off to get max throughput block, bfq: use pointer entity->sched_data only if set nvme: lightnvm: fix memory leak blk-mq: make __blk_mq_stop_hw_queues static lightnvm: remove unused rq parameter of nvme_nvm_rqtocmd() to kill warning block/mq: fix potential deadlock during cpu hotplug	2017-05-11 11:01:56 -07:00
Rakesh Pandit	fba704b494	nvme: lightnvm: fix memory leak Free up kmalloc allocated memory if failure happens while handling L2P table transfer in nvme_nvm_get_l2p_tbl. Fixes: `8e79b5cb` ("lightnvm: move block provisioning to targets") Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-10 07:39:43 -06:00
Linus Torvalds	857f864014	pci-v4.12-changes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJZEHmsAAoJEFmIoMA60/r88SgQAJbFddueb0+DfJ+USDud4b/Z akfS+G1UAm+TgtMyh1wM49dHzFssp36uWJxtWI+bPqBzuy94PMCbz7JVUV28gX9G tFhFuc5YH94I/3y85rbZnolb6uZN9MhLjzTFqDC9ilW6HFqmwK4t4wlHSCjQN1St svLYvs2G6n6/VK3Fre7/wOvdZ1erG4Qod+kn5Tx3K5TQydmRlaSBfK+DRANuDBkM KzGO7Bkc/Cx8hb9pHmaey/wxmNrrgmVjTtWrEnb2tEq833zP4h6GhUIJEKodMSi5 gXPNZgKlu3n5L592M0UCh4EoHejzkv9wrcsoDm+djmsc5Zg2Howq4kAdHP8k4hUG 0gt8n0ni9vhJN56jikrGi7cAdHCKSNnx2Ue/qTCbX0ncB3XUMuJxJwCsgW/6wa9f oU7tRtTS03UltnKoFAcyYclS4TaSY4SA4ySaK6Hi+cRkdVFDdyHQYbHHNSU7MsA+ IS2tXvGoIdSYyrZMHSRcl2rRTfYQUkmPEvBF3LvqZr32M4mJMmUNAPLZaly373ZE iwq0ZJlrLeM0cqdFIG3S60RtJyQk/HBN1NMqrYHArWOxvWIgNd5F8NCsTTxY3wU3 IxgBIuUFcbVwVkqEHGs8K5AvB3oghqdnA3eGOV79799eMtLn3LOvyIlpHMSw9WUq ags00JtMLitfNPBH3eSl =eE4D -----END PGP SIGNATURE----- Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - add framework for supporting PCIe devices in Endpoint mode (Kishon Vijay Abraham I) - use non-postable PCI config space mappings when possible (Lorenzo Pieralisi) - clean up and unify mmap of PCI BARs (David Woodhouse) - export and unify Function Level Reset support (Christoph Hellwig) - avoid FLR for Intel 82579 NICs (Sasha Neftin) - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig) - short-circuit config access failures for disconnected devices (Keith Busch) - remove D3 sleep delay when possible (Adrian Hunter) - freeze PME scan before suspending devices (Lukas Wunner) - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava) - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann) - add arch-specific alignment control to improve device passthrough by avoiding multiple BARs in a page (Yongji Xie) - add sysfs sriov_drivers_autoprobe to control VF driver binding (Bodong Wang) - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas) - fix crashes when unbinding host controllers that don't support removal (Brian Norris) - add driver for MicroSemi Switchtec management interface (Logan Gunthorpe) - add driver for Faraday Technology FTPCI100 host bridge (Linus Walleij) - add i.MX7D support (Andrey Smirnov) - use generic MSI support for Aardvark (Thomas Petazzoni) - make Rockchip driver modular (Brian Norris) - advertise 128-byte Read Completion Boundary support for Rockchip (Shawn Lin) - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin) - convert atomic_t to refcount_t in HV driver (Elena Reshetova) - add CPU IRQ affinity in HV driver (K. Y. Srinivasan) - fix PCI bus removal in HV driver (Long Li) - add support for ThunderX2 DMA alias topology (Jayachandran C) - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki) - add ITE 8893 bridge DMA alias quirk (Jarod Wilson) - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices (Manish Jaggi) * tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits) PCI: Don't allow unbinding host controllers that aren't prepared ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP MAINTAINERS: Add PCI Endpoint maintainer Documentation: PCI: Add userguide for PCI endpoint test function tools: PCI: Add sample test script to invoke pcitest tools: PCI: Add a userspace tool to test PCI endpoint Documentation: misc-devices: Add Documentation for pci-endpoint-test driver misc: Add host side PCI driver for PCI test function device PCI: Add device IDs for DRA74x and DRA72x dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access PCI: dwc: dra7xx: Workaround for errata id i870 dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode PCI: dwc: dra7xx: Add EP mode support PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently dt-bindings: PCI: Add DT bindings for PCI designware EP mode PCI: dwc: designware: Add EP mode support Documentation: PCI: Add binding documentation for pci-test endpoint function ixgbe: Use pcie_flr() instead of duplicating it IB/hfi1: Use pcie_flr() instead of duplicating it PCI: imx6: Fix spelling mistake: "contol" -> "control" ...	2017-05-08 19:03:25 -07:00
Geert Uytterhoeven	629b1b2e0e	lightnvm: remove unused rq parameter of nvme_nvm_rqtocmd() to kill warning With gcc 4.1.2: drivers/nvme/host/lightnvm.c: In function ‘nvme_nvm_submit_io’: drivers/nvme/host/lightnvm.c:498: warning: ‘rq’ is used uninitialized in this function Indeed, since commit `2e13f33a24` ("lightnvm: create cmd before allocating request"), the request is passed to nvme_nvm_rqtocmd() before it is allocated. Fortunately, as of commit `91276162de` ("lightnvm: refactor end_io functions for sync"), nvme_nvm_rqtocmd () no longer uses the passed request, so this warning is a false positive. Drop the unused parameter to clean up the code and kill the warning. Fixes: `2e13f33a24` ("lightnvm: create cmd before allocating request") Fixes: `91276162de` ("lightnvm: refactor end_io functions for sync") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-07 19:52:45 -06:00
Javier González	2e13f33a24	lightnvm: create cmd before allocating request Create nvme command before allocating a request using nvme_alloc_request, which uses the command direction. Up until now, the command has been zeroized, so all commands have been allocated as a read operation. Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Matias Bjørling <matias@cnexlabs.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-04 07:53:04 -06:00
Christoph Hellwig	d6296d39e9	blk-mq: update ->init_request and ->exit_request prototypes Remove the request_idx parameter, which can't be used safely now that we support I/O schedulers with blk-mq. Except for a superflous check in mtip32xx it was unused anyway. Also pass the tag_set instead of just the driver data - this allows drivers to avoid some code duplication in a follow on cleanup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-02 07:52:08 -06:00
Jens Axboe	b06e13c38d	Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-4.12/post-merge Christoph writes: "A couple more updates for 4.12. The biggest pile is fc and lpfc updates from James, but there are various small fixes and cleanups as well." Fixes up a few merge issues, and also a warning in lpfc_nvmet_rcv_unsol_abort() if CONFIG_NVME_TARGET_FC isn't enabled. Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-27 11:33:01 -06:00
Christoph Hellwig	7569b90a22	nvme-scsi: remove nvme_trans_security_protocol This function just returns the same error code and sense data as the default statement in the switch in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>	2017-04-27 08:39:32 +02:00
Christoph Hellwig	25d9baa475	nvme-lightnvm: add missing endianess conversion in nvme_nvm_end_io Found by sparse. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <matias@cnexlabs.com>	2017-04-25 20:01:15 +02:00
Jon Derrick	7fad1fd46c	nvme-scsi: Consider LBA format in IO splitting calculation The current command submission code uses a sector-based value when considering the maximum number of blocks per command. With a 4k-formatted namespace and a command exceeding max hardware limits, this calculation doesn't split IOs which should be split and fails in the nvme layer. This patch fixes that calculation and enables IO splitting in these circumstances. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Jens Axboe <axboe@fb.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-04-25 20:01:00 +02:00
Ewan D. Milne	de41447aac	nvme-fc: avoid memory corruption caused by calling nvmf_free_options() twice Do not call nvmf_free_options() from the nvme_fc_ctlr destructor if nvme_fc_create_ctrl() returns an error, because nvmf_create_ctrl() frees the options when an error is returned. Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-04-25 20:00:59 +02:00
Andy Lutomirski	c35e30b472	nvme: Add nvme_core.force_apst to ignore the NO_APST quirk We're probably going to be stuck quirking APST off on an over-broad range of devices for 4.11. Let's make it easy to override the quirk for testing. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-24 22:03:46 -06:00
Andy Lutomirski	fb0dc3993b	nvme: Display raw APST configuration via DYNAMIC_DEBUG Debugging APST is currently a bit of a pain. This gives optional simple log messages that describe the APST state. The easiest way to use this is probably with the nvme_core.dyndbg=+p module parameter. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-24 22:03:46 -06:00
Andy Lutomirski	76e4ad09a3	nvme: Fix APST comment There was a typo in the description of the timeout heuristic. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-24 22:03:46 -06:00
Jens Axboe	d9fd363a6c	Merge branch 'master' into for-4.12/post-merge	2017-04-24 22:03:14 -06:00
Christoph Hellwig	baee29ac17	nvme-fc: mark two symbols static Found by sparse. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>	2017-04-24 09:18:22 +02:00
James Smart	61bff8ef00	nvme_fc: add controller reset support This patch actually does quite a few things. When looking to add controller reset support, the organization modeled after rdma was very fragmented. rdma duplicates the reset and teardown paths and does different things to the block layer on the two paths. The code to build up the controller is also duplicated between the initial creation and the reset/error recovery paths. So I decided to make this sane. I reorganized the controller creation and teardown so that there is a connect path and a disconnect path. Initial creation obviously uses the connect path. Controller teardown will use the disconnect path, followed last access code. Controller reset will use the disconnect path to stop operation, and then the connect path to re-establish the controller. Along the way, several things were fixed - aens were not properly set up. They are allocated differently from the per-request structure on the blk queues. - aens were oddly torn down. the prior patch corrected to abort, but we still need to dma unmap and free relative elements. - missed a few ref counting points: in aen completion and on i/o's that fail - controller initial create failure paths were still confused vs teardown before converting to ref counting vs after we convert to refcounting. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-04-24 09:16:29 +02:00
James Smart	78a7ac260e	nvme_fc: add aen abort to teardown Add abort support for aens. Commonized the op abort to apply to aen or real ios (caused some reorg/routine movement). Abort path sets termination flag in prep for next patch that will be watching i/o abort completion before proceeding with controller teardown. Now that we're aborting aens, the "exit" code that simply cleared out their context no longer applies. Also clarified how we detect an AEN vs a normal io - by a flag, not by whether a rq exists or the a rqno is out of range. Note: saw some interesting cases where if the queues are stopped and we're waiting for the aborts, the core layer can call the complete_rq callback for the io. So the io completion synchronizes link side completion with possible blk layer completion under error. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-04-24 09:16:03 +02:00
James Smart	458f280d71	nvme_fc: fix command id check The code validates the command_id in the response to the original sqe command. But prior code was using the rq->rqno as the sqe command id. The core layer overwrites what the transport set there originally. Use the actual sqe content. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-04-24 09:15:04 +02:00
Junxiong Guan	e02ab02304	nvme: let dm-mpath distinguish nvme error codes Currently most IOs which return the nvme error codes are retried on the other path if those IOs returns EIO from NVMe driver. This patch let Multipath distinguish nvme media error codes and some generic or cmd-specific nvme error codes so that multipath will not retry those kinds of IO, to save bandwidth. Signed-off-by: Junxiong Guan <guanjunxiong@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-04-21 16:41:56 +02:00
Keith Busch	7776db1ccc	nvme/pci: Poll CQ on timeout If an IO timeout occurs, it's helpful to know if the controller did not post a completion or the driver missed an interrupt. While we never expect the latter, this patch will make it possible to tell the difference so we don't have to guess. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>	2017-04-21 16:41:55 +02:00
James Smart	8d64daf7dc	nvme_fc: Add ls aborts on remote port teardown remoteport teardown never aborted the LS opertions. Add support. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2017-04-21 16:41:53 +02:00
James Smart	c913a8b0d4	nvme_fc: Move LS's to rport Link LS's on the remoteport rather than the controller. LS's are between nport's. Makes more sense, especially on async teardown where the controller is torn down regardless of the LS (LS is more of a notifier to the target of the teardown), to have them on the remoteport. While revising ls send/done routines, issues were seen relative to refcounting and cleanup, especially in async path. Reworked these code paths. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2017-04-21 16:41:52 +02:00
Helen Koike	f9f38e3338	nvme: improve performance for virtual NVMe devices This change provides a mechanism to reduce the number of MMIO doorbell writes for the NVMe driver. When running in a virtualized environment like QEMU, the cost of an MMIO is quite hefy here. The main idea for the patch is provide the device two memory location locations: 1) to store the doorbell values so they can be lookup without the doorbell MMIO write 2) to store an event index. I believe the doorbell value is obvious, the event index not so much. Similar to the virtio specification, the virtual device can tell the driver (guest OS) not to write MMIO unless you are writing past this value. FYI: doorbell values are written by the nvme driver (guest OS) and the event index is written by the virtual device (host OS). The patch implements a new admin command that will communicate where these two memory locations reside. If the command fails, the nvme driver will work as before without any optimizations. Contributions: Eric Northup <digitaleric@google.com> Frank Swiderski <fes@google.com> Ted Tso <tytso@mit.edu> Keith Busch <keith.busch@intel.com> Just to give an idea on the performance boost with the vendor extension: Running fio [1], a stock NVMe driver I get about 200K read IOPs with my vendor patch I get about 1000K read IOPs. This was running with a null device i.e. the backing device simply returned success on every read IO request. [1] Running on a 4 core machine: fio --time_based --name=benchmark --runtime=30 --filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randread --blocksize=4k --randrepeat=false Signed-off-by: Rob Nelson <rlnelson@google.com> [mlin: port for upstream] Signed-off-by: Ming Lin <mlin@kernel.org> [koike: updated for upstream] Signed-off-by: Helen Koike <helen.koike@collabora.co.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>	2017-04-21 16:41:47 +02:00
Keith Busch	81c1cd9835	nvme/pci: Don't set reserved SQ create flags The QPRIO field is only valid if weighted round robin arbitration is used, and this driver doesn't enable that controller configuration option. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2017-04-21 16:41:46 +02:00
Andy Lutomirski	be56945c4e	nvme: Quirk APST off on "THNSF5256GPUK TOSHIBA" There's a report that it malfunctions with APST on. See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-20 14:42:10 -06:00

1 2 3 4 5 ...

460 Commits