During code reviews duplicate code sections were found to determine the WQ
Create version. The duplication was potentially overriding logic that
validated page size.
Link: https://lore.kernel.org/r/20201020202719.54726-6-james.smart@broadcom.com
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following call trace was seen during HBA reset testing:
BUG: scheduling while atomic: swapper/2/0/0x10000100
...
Call Trace:
dump_stack+0x19/0x1b
__schedule_bug+0x64/0x72
__schedule+0x782/0x840
__cond_resched+0x26/0x30
_cond_resched+0x3a/0x50
mempool_alloc+0xa0/0x170
lpfc_unreg_rpi+0x151/0x630 [lpfc]
lpfc_sli_abts_recover_port+0x171/0x190 [lpfc]
lpfc_sli4_abts_err_handler+0xb2/0x1f0 [lpfc]
lpfc_sli4_io_xri_aborted+0x256/0x300 [lpfc]
lpfc_sli4_sp_handle_abort_xri_wcqe.isra.51+0xa3/0x190 [lpfc]
lpfc_sli4_fp_handle_cqe+0x89/0x4d0 [lpfc]
__lpfc_sli4_process_cq+0xdb/0x2e0 [lpfc]
__lpfc_sli4_hba_process_cq+0x41/0x100 [lpfc]
lpfc_cq_poll_hdler+0x1a/0x30 [lpfc]
irq_poll_softirq+0xc7/0x100
__do_softirq+0xf5/0x280
call_softirq+0x1c/0x30
do_softirq+0x65/0xa0
irq_exit+0x105/0x110
do_IRQ+0x56/0xf0
common_interrupt+0x16a/0x16a
With the conversion to blk_io_poll for better interrupt latency in normal
cases, it introduced this code path, executed when I/O aborts or logouts
are seen, which attempts to allocate memory for a mailbox command to be
issued. The allocation is GFP_KERNEL, thus it could attempt to sleep.
Fix by creating a work element that performs the event handling for the
remote port. This will have the mailbox commands and other items performed
in the work element, not the irq. A much better method as the "irq" routine
does not stall while performing all this deep handling code.
Ensure that allocation failures are handled and send LOGO on failure.
Additionally, enlarge the mailbox memory pool to reduce the possibility of
additional allocation in this path.
Link: https://lore.kernel.org/r/20201020202719.54726-3-james.smart@broadcom.com
Fixes: 317aeb83c9 ("scsi: lpfc: Add blk_io_poll support for latency improvment")
Cc: <stable@vger.kernel.org> # v5.9+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following calltrace was seen:
BUG: sleeping function called from invalid context at mm/slab.h:494
...
Call Trace:
dump_stack+0x9a/0xf0
___might_sleep.cold.63+0x13d/0x178
slab_pre_alloc_hook+0x6a/0x90
kmem_cache_alloc_trace+0x3a/0x2d0
lpfc_sli4_nvmet_alloc+0x4c/0x280 [lpfc]
lpfc_post_rq_buffer+0x2e7/0xa60 [lpfc]
lpfc_sli4_hba_setup+0x6b4c/0xa4b0 [lpfc]
lpfc_pci_probe_one_s4.isra.15+0x14f8/0x2280 [lpfc]
lpfc_pci_probe_one+0x260/0x2880 [lpfc]
local_pci_probe+0xd4/0x180
work_for_cpu_fn+0x51/0xa0
process_one_work+0x8f0/0x17b0
worker_thread+0x536/0xb50
kthread+0x30c/0x3d0
ret_from_fork+0x3a/0x50
A prior patch introduced a spin_lock_irqsave(hbalock) in the
lpfc_post_rq_buffer() routine. Call trace is seen as the hbalock is held
with interrupts disabled during a GFP_KERNEL allocation in
lpfc_sli4_nvmet_alloc().
Fix by reordering locking so that hbalock not held when calling
sli4_nvmet_alloc() (aka rqb_buf_list()).
Link: https://lore.kernel.org/r/20201020202719.54726-2-james.smart@broadcom.com
Fixes: 411de511c6 ("scsi: lpfc: Fix RQ empty firmware trap")
Cc: <stable@vger.kernel.org> # v4.17+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While mds diagnostic tests are running, if the driver is requested to be
unloaded, oops or hangs are observed. The driver doesn't terminate the
processing of diag frames when the unload is started. As such: oops may be
seen for __lpfc_sli_release_iocbq_s4 because ring memory is referenced that
was already freed; or hangs see in lpfc_nvme_wait_for_io_drain as ios no
longer complete.
If unloading, don't process diag frames. Just clean them up.
Link: https://lore.kernel.org/r/20200803210229.23063-5-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
'pg_addr' is only used when CONFIG_X86 is defined. So only declare it if
CONFIG_X86 is defined.
Fixes the following W=1 kernel build warning(s):
drivers/scsi/lpfc/lpfc_sli.c: In function ‘lpfc_wq_create’:
drivers/scsi/lpfc/lpfc_sli.c:15813:16: warning: unused variable ‘pg_addr’ [-Wunused-variable]
15813 | unsigned long pg_addr;
| ^~~~~~~
Link: https://lore.kernel.org/r/20200721164148.2617584-37-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes the following W=1 kernel build warning(s):
drivers/scsi/lpfc/lpfc_sli.c:257: warning: Function parameter or member 'mqe' not described in 'lpfc_sli4_mq_put'
drivers/scsi/lpfc/lpfc_sli.c:257: warning: Excess function parameter 'wqe' description in 'lpfc_sli4_mq_put'
drivers/scsi/lpfc/lpfc_sli.c:675: warning: Function parameter or member 'hq' not described in 'lpfc_sli4_rq_put'
drivers/scsi/lpfc/lpfc_sli.c:675: warning: Function parameter or member 'dq' not described in 'lpfc_sli4_rq_put'
drivers/scsi/lpfc/lpfc_sli.c:675: warning: Function parameter or member 'hrqe' not described in 'lpfc_sli4_rq_put'
drivers/scsi/lpfc/lpfc_sli.c:675: warning: Function parameter or member 'drqe' not described in 'lpfc_sli4_rq_put'
drivers/scsi/lpfc/lpfc_sli.c:675: warning: Excess function parameter 'q' description in 'lpfc_sli4_rq_put'
drivers/scsi/lpfc/lpfc_sli.c:675: warning: Excess function parameter 'wqe' description in 'lpfc_sli4_rq_put'
drivers/scsi/lpfc/lpfc_sli.c:738: warning: Function parameter or member 'hq' not described in 'lpfc_sli4_rq_release'
drivers/scsi/lpfc/lpfc_sli.c:738: warning: Function parameter or member 'dq' not described in 'lpfc_sli4_rq_release'
drivers/scsi/lpfc/lpfc_sli.c:738: warning: Excess function parameter 'q' description in 'lpfc_sli4_rq_release'
drivers/scsi/lpfc/lpfc_sli.c:1021: warning: Function parameter or member 'xritag' not described in 'lpfc_test_rrq_active'
drivers/scsi/lpfc/lpfc_sli.c:1132: warning: Function parameter or member 'piocbq' not described in '__lpfc_sli_get_els_sglq'
drivers/scsi/lpfc/lpfc_sli.c:1132: warning: Excess function parameter 'piocb' description in '__lpfc_sli_get_els_sglq'
drivers/scsi/lpfc/lpfc_sli.c:1207: warning: Function parameter or member 'piocbq' not described in '__lpfc_sli_get_nvmet_sglq'
drivers/scsi/lpfc/lpfc_sli.c:1207: warning: Excess function parameter 'piocb' description in '__lpfc_sli_get_nvmet_sglq'
drivers/scsi/lpfc/lpfc_sli.c:2243: warning: Function parameter or member 'rb_list' not described in 'lpfc_sli_hbqbuf_get'
drivers/scsi/lpfc/lpfc_sli.c:2243: warning: Excess function parameter 'phba' description in 'lpfc_sli_hbqbuf_get'
drivers/scsi/lpfc/lpfc_sli.c:2243: warning: Excess function parameter 'hbqno' description in 'lpfc_sli_hbqbuf_get'
drivers/scsi/lpfc/lpfc_sli.c:2262: warning: Function parameter or member 'hrq' not described in 'lpfc_sli_rqbuf_get'
drivers/scsi/lpfc/lpfc_sli.c:2262: warning: Excess function parameter 'hbqno' description in 'lpfc_sli_rqbuf_get'
drivers/scsi/lpfc/lpfc_sli.c:3429: warning: Function parameter or member 't' not described in 'lpfc_poll_eratt'
drivers/scsi/lpfc/lpfc_sli.c:3429: warning: Excess function parameter 'ptr' description in 'lpfc_poll_eratt'
drivers/scsi/lpfc/lpfc_sli.c:4115: warning: Excess function parameter 'pring' description in 'lpfc_sli_abort_fcp_rings'
drivers/scsi/lpfc/lpfc_sli.c:5331: warning: Excess function parameter 'mboxq' description in 'lpfc_sli4_read_fcoe_params'
drivers/scsi/lpfc/lpfc_sli.c:5879: warning: Function parameter or member 'extnt_cnt' not described in 'lpfc_sli4_cfg_post_extnts'
drivers/scsi/lpfc/lpfc_sli.c:5879: warning: Function parameter or member 'type' not described in 'lpfc_sli4_cfg_post_extnts'
drivers/scsi/lpfc/lpfc_sli.c:5879: warning: Function parameter or member 'emb' not described in 'lpfc_sli4_cfg_post_extnts'
drivers/scsi/lpfc/lpfc_sli.c:5879: warning: Function parameter or member 'mbox' not described in 'lpfc_sli4_cfg_post_extnts'
drivers/scsi/lpfc/lpfc_sli.c:6459: warning: Function parameter or member 'pmb' not described in 'lpfc_sli4_ras_mbox_cmpl'
drivers/scsi/lpfc/lpfc_sli.c:6459: warning: Excess function parameter 'pmboxq' description in 'lpfc_sli4_ras_mbox_cmpl'
drivers/scsi/lpfc/lpfc_sli.c:6912: warning: Function parameter or member 'extnt_cnt' not described in 'lpfc_sli4_get_allocated_extnts'
drivers/scsi/lpfc/lpfc_sli.c:6912: warning: Excess function parameter 'extnt_count' description in 'lpfc_sli4_get_allocated_extnts'
drivers/scsi/lpfc/lpfc_sli.c:7064: warning: Excess function parameter 'pring' description in 'lpfc_sli4_repost_sgl_list'
drivers/scsi/lpfc/lpfc_sli.c:7312: warning: Function parameter or member 'phba' not described in 'lpfc_init_idle_stat_hb'
drivers/scsi/lpfc/lpfc_sli.c:8022: warning: Function parameter or member 't' not described in 'lpfc_mbox_timeout'
drivers/scsi/lpfc/lpfc_sli.c:8022: warning: Excess function parameter 'ptr' description in 'lpfc_mbox_timeout'
drivers/scsi/lpfc/lpfc_sli.c:8902: warning: Function parameter or member 'mboxq' not described in 'lpfc_sli_issue_mbox_s4'
drivers/scsi/lpfc/lpfc_sli.c:8902: warning: Excess function parameter 'pmbox' description in 'lpfc_sli_issue_mbox_s4'
drivers/scsi/lpfc/lpfc_sli.c:9413: warning: Function parameter or member 'piocbq' not described in 'lpfc_sli4_bpl2sgl'
drivers/scsi/lpfc/lpfc_sli.c:9413: warning: Excess function parameter 'piocb' description in 'lpfc_sli4_bpl2sgl'
drivers/scsi/lpfc/lpfc_sli.c:9518: warning: Function parameter or member 'iocbq' not described in 'lpfc_sli4_iocb2wqe'
drivers/scsi/lpfc/lpfc_sli.c:9518: warning: Excess function parameter 'piocb' description in 'lpfc_sli4_iocb2wqe'
drivers/scsi/lpfc/lpfc_sli.c:10212: warning: Function parameter or member 'phba' not described in '__lpfc_sli_issue_iocb'
drivers/scsi/lpfc/lpfc_sli.c:10212: warning: Function parameter or member 'ring_number' not described in '__lpfc_sli_issue_iocb'
drivers/scsi/lpfc/lpfc_sli.c:10212: warning: Function parameter or member 'piocb' not described in '__lpfc_sli_issue_iocb'
drivers/scsi/lpfc/lpfc_sli.c:10212: warning: Function parameter or member 'flag' not described in '__lpfc_sli_issue_iocb'
drivers/scsi/lpfc/lpfc_sli.c:10300: warning: Function parameter or member 'ring_number' not described in 'lpfc_sli_issue_iocb'
drivers/scsi/lpfc/lpfc_sli.c:10300: warning: Excess function parameter 'pring' description in 'lpfc_sli_issue_iocb'
drivers/scsi/lpfc/lpfc_sli.c:11807: warning: Function parameter or member 'cmd' not described in 'lpfc_sli_abort_taskmgmt'
drivers/scsi/lpfc/lpfc_sli.c:11807: warning: Excess function parameter 'taskmgmt_cmd' description in 'lpfc_sli_abort_taskmgmt'
drivers/scsi/lpfc/lpfc_sli.c:12067: warning: Function parameter or member 'ring_number' not described in 'lpfc_sli_issue_iocb_wait'
drivers/scsi/lpfc/lpfc_sli.c:12067: warning: Excess function parameter 'pring' description in 'lpfc_sli_issue_iocb_wait'
drivers/scsi/lpfc/lpfc_sli.c:12262: warning: Function parameter or member 'mbx_action' not described in 'lpfc_sli_mbox_sys_shutdown'
drivers/scsi/lpfc/lpfc_sli.c:13219: warning: Function parameter or member 'irspiocbq' not described in 'lpfc_sli4_els_wcqe_to_rspiocbq'
drivers/scsi/lpfc/lpfc_sli.c:13219: warning: Excess function parameter 'wcqe' description in 'lpfc_sli4_els_wcqe_to_rspiocbq'
drivers/scsi/lpfc/lpfc_sli.c:13285: warning: Function parameter or member 'mcqe' not described in 'lpfc_sli4_sp_handle_async_event'
drivers/scsi/lpfc/lpfc_sli.c:13285: warning: Excess function parameter 'cqe' description in 'lpfc_sli4_sp_handle_async_event'
drivers/scsi/lpfc/lpfc_sli.c:13318: warning: Function parameter or member 'mcqe' not described in 'lpfc_sli4_sp_handle_mbox_event'
drivers/scsi/lpfc/lpfc_sli.c:13318: warning: Excess function parameter 'cqe' description in 'lpfc_sli4_sp_handle_mbox_event'
drivers/scsi/lpfc/lpfc_sli.c:13441: warning: Function parameter or member 'cq' not described in 'lpfc_sli4_sp_handle_mcqe'
drivers/scsi/lpfc/lpfc_sli.c:13768: warning: Function parameter or member 'speq' not described in 'lpfc_sli4_sp_handle_eqe'
drivers/scsi/lpfc/lpfc_sli.c:14126: warning: Function parameter or member 'cq' not described in 'lpfc_sli4_nvmet_handle_rcqe'
drivers/scsi/lpfc/lpfc_sli.c:14235: warning: Function parameter or member 'cqe' not described in 'lpfc_sli4_fp_handle_cqe'
drivers/scsi/lpfc/lpfc_sli.c:14235: warning: Excess function parameter 'eqe' description in 'lpfc_sli4_fp_handle_cqe'
drivers/scsi/lpfc/lpfc_sli.c:14336: warning: Function parameter or member 'eq' not described in 'lpfc_sli4_hba_handle_eqe'
drivers/scsi/lpfc/lpfc_sli.c:14808: warning: Function parameter or member 'entry_count' not described in 'lpfc_sli4_queue_alloc'
drivers/scsi/lpfc/lpfc_sli.c:15185: warning: Function parameter or member 'type' not described in 'lpfc_cq_create'
drivers/scsi/lpfc/lpfc_sli.c:15185: warning: Function parameter or member 'subtype' not described in 'lpfc_cq_create'
drivers/scsi/lpfc/lpfc_sli.c:15333: warning: Function parameter or member 'type' not described in 'lpfc_cq_create_set'
drivers/scsi/lpfc/lpfc_sli.c:15333: warning: Function parameter or member 'subtype' not described in 'lpfc_cq_create_set'
drivers/scsi/lpfc/lpfc_sli.c:16063: warning: Function parameter or member 'subtype' not described in 'lpfc_rq_create'
drivers/scsi/lpfc/lpfc_sli.c:16353: warning: Function parameter or member 'subtype' not described in 'lpfc_mrq_create'
drivers/scsi/lpfc/lpfc_sli.c:16533: warning: Function parameter or member 'phba' not described in 'lpfc_eq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16590: warning: Function parameter or member 'phba' not described in 'lpfc_cq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16644: warning: Function parameter or member 'phba' not described in 'lpfc_mq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16644: warning: Function parameter or member 'mq' not described in 'lpfc_mq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16644: warning: Excess function parameter 'qm' description in 'lpfc_mq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16698: warning: Function parameter or member 'phba' not described in 'lpfc_wq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16754: warning: Function parameter or member 'phba' not described in 'lpfc_rq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16754: warning: Function parameter or member 'hrq' not described in 'lpfc_rq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16754: warning: Function parameter or member 'drq' not described in 'lpfc_rq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16754: warning: Excess function parameter 'rq' description in 'lpfc_rq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16940: warning: Function parameter or member 'xri' not described in '__lpfc_sli4_free_xri'
drivers/scsi/lpfc/lpfc_sli.c:16955: warning: Function parameter or member 'xri' not described in 'lpfc_sli4_free_xri'
drivers/scsi/lpfc/lpfc_sli.c:17002: warning: Function parameter or member 'post_cnt' not described in 'lpfc_sli4_post_sgl_list'
drivers/scsi/lpfc/lpfc_sli.c:17002: warning: Excess function parameter 'count' description in 'lpfc_sli4_post_sgl_list'
drivers/scsi/lpfc/lpfc_sli.c:17221: warning: Function parameter or member 'sb_count' not described in 'lpfc_sli4_post_io_sgl_list'
drivers/scsi/lpfc/lpfc_sli.c:17451: warning: Function parameter or member 'did' not described in 'lpfc_fc_frame_to_vport'
drivers/scsi/lpfc/lpfc_sli.c:17590: warning: Function parameter or member 'vport' not described in 'lpfc_fc_frame_add'
drivers/scsi/lpfc/lpfc_sli.c:17817: warning: Function parameter or member 'vport' not described in 'lpfc_sli4_seq_abort_rsp'
drivers/scsi/lpfc/lpfc_sli.c:17817: warning: Function parameter or member 'aborted' not described in 'lpfc_sli4_seq_abort_rsp'
drivers/scsi/lpfc/lpfc_sli.c:17817: warning: Excess function parameter 'phba' description in 'lpfc_sli4_seq_abort_rsp'
drivers/scsi/lpfc/lpfc_sli.c:18060: warning: Function parameter or member 'seq_dmabuf' not described in 'lpfc_prep_seq'
drivers/scsi/lpfc/lpfc_sli.c:18060: warning: Excess function parameter 'dmabuf' description in 'lpfc_prep_seq'
drivers/scsi/lpfc/lpfc_sli.c:18332: warning: Function parameter or member 'dmabuf' not described in 'lpfc_sli4_handle_received_buffer'
drivers/scsi/lpfc/lpfc_sli.c:18655: warning: Function parameter or member 'rpi' not described in '__lpfc_sli4_free_rpi'
drivers/scsi/lpfc/lpfc_sli.c:18683: warning: Function parameter or member 'rpi' not described in 'lpfc_sli4_free_rpi'
drivers/scsi/lpfc/lpfc_sli.c:18714: warning: Function parameter or member 'ndlp' not described in 'lpfc_sli4_resume_rpi'
drivers/scsi/lpfc/lpfc_sli.c:18714: warning: Function parameter or member 'cmpl' not described in 'lpfc_sli4_resume_rpi'
drivers/scsi/lpfc/lpfc_sli.c:18714: warning: Function parameter or member 'arg' not described in 'lpfc_sli4_resume_rpi'
drivers/scsi/lpfc/lpfc_sli.c:18714: warning: Excess function parameter 'phba' description in 'lpfc_sli4_resume_rpi'
drivers/scsi/lpfc/lpfc_sli.c:19103: warning: Function parameter or member 'phba' not described in 'lpfc_check_next_fcf_pri_level'
drivers/scsi/lpfc/lpfc_sli.c:19266: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_fcf_rr_index_set'
drivers/scsi/lpfc/lpfc_sli.c:19295: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_fcf_rr_index_clear'
drivers/scsi/lpfc/lpfc_sli.c:19331: warning: Function parameter or member 'mbox' not described in 'lpfc_mbx_cmpl_redisc_fcf_table'
drivers/scsi/lpfc/lpfc_sli.c:20027: warning: Function parameter or member 'pwqeq' not described in 'lpfc_wqe_bpl2sgl'
drivers/scsi/lpfc/lpfc_sli.c:20027: warning: Excess function parameter 'pwqe' description in 'lpfc_wqe_bpl2sgl'
drivers/scsi/lpfc/lpfc_sli.c:20141: warning: Function parameter or member 'qp' not described in 'lpfc_sli4_issue_wqe'
drivers/scsi/lpfc/lpfc_sli.c:20141: warning: Excess function parameter 'ring_number' description in 'lpfc_sli4_issue_wqe'
drivers/scsi/lpfc/lpfc_sli.c:20434: warning: Function parameter or member 'qp' not described in '_lpfc_move_xri_pbl_to_pvt'
drivers/scsi/lpfc/lpfc_sli.c:20552: warning: Function parameter or member 'hwqid' not described in 'lpfc_keep_pvt_pool_above_lowwm'
drivers/scsi/lpfc/lpfc_sli.c:20552: warning: Excess function parameter 'qp' description in 'lpfc_keep_pvt_pool_above_lowwm'
drivers/scsi/lpfc/lpfc_sli.c:20682: warning: Function parameter or member 'qp' not described in 'lpfc_get_io_buf_from_private_pool'
Link: https://lore.kernel.org/r/20200721164148.2617584-24-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes the following W=1 kernel build warning(s):
drivers/scsi/lpfc/lpfc_sli.c: In function ‘lpfc_wq_create’:
drivers/scsi/lpfc/lpfc_sli.c:15810:16: warning: variable ‘pg_addr’ set but not used [-Wunused-but-set-variable]
15810 | unsigned long pg_addr;
| ^~~~~~~
Link: https://lore.kernel.org/r/20200721164148.2617584-21-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix smatch warning:
drivers/scsi/lpfc/lpfc_sli.c:15156 lpfc_cq_poll_hdler() warn:
inconsistent indenting
Link: https://lore.kernel.org/r/20200707150018.823350-1-colin.king@canonical.com
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current logging methods typically end up requesting a reproduction with
a different logging level set to figure out what happened. This was mainly
by design to not clutter the kernel log messages with things that were
typically not interesting and the messages themselves could cause other
issues.
When looking to make a better system, it was seen that in many cases when
more data was wanted was when another message, usually at KERN_ERR level,
was logged. And in most cases, what the additional logging that was then
enabled was typically. Most of these areas fell into the discovery machine.
Based on this summary, the following design has been put in place: The
driver will maintain an internal log (256 elements of 256 bytes). The
"additional logging" messages that are usually enabled in a reproduction
will be changed to now log all the time to the internal log. A new logging
level is defined - LOG_TRACE_EVENT. When this level is set (it is not by
default) and a message marked as KERN_ERR is logged, all the messages in
the internal log will be dumped to the kernel log before the KERN_ERR
message is logged.
There is a timestamp on each message added to the internal log. However,
this timestamp is not converted to wall time when logged. The value of the
timestamp is solely to give a crude time reference for the messages.
Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Although the existing implementation is very good at high I/O load, on
tests involving light load, especially on only a few hardware queues,
latency was a little higher than it can be due to using workqueue
scheduling. Other tasks in the system can delay handling.
Change the lower level to use irq_poll by default which uses a softirq for
I/O completion. This gives better latency as variance in when the cq is
processed is reduced over the workqueue interface. However, as high load is
better served by not being in softirq when the CPU is loaded, work queues
are still used under high I/O load.
Link: https://lore.kernel.org/r/20200630215001.70793-13-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, if there has been an issue whereby an adapter dump was taken,
there is nothing displayed to hint that it is present. Utilities must be
run and they must query for the status in order to then download the dump.
Add a message to the driver to query dump image presence when initializing
the SLI Port.
Link: https://lore.kernel.org/r/20200630215001.70793-12-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Change vocabulary of 0373 log msg from "error" to "cmpl" The current
language of the 0373 message contains the word "error" which caused a
number of customers to inquire about the "error" and if it should be a
concern. It isn't an error, it's simply an io completion status.
Revise the message to replace the word "error" with "cmpl" for completion.
Link: https://lore.kernel.org/r/20200630215001.70793-10-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the kdump kernel shuts down lpfc calls flush_work_queue on an
interrupt to schedule the cq handler. When there is only one CPU active on
the kdump kernel, it is possible for the work_on to get scheduled on a
non-active CPU causing it to never be scheduled.
When in the kdump environment, per-CPU affinity of cq's to cpus is not
necessary. In those cases, use a general queue_work rather than a
queue_work_on().
Link: https://lore.kernel.org/r/20200630215001.70793-9-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Call traces have been observed running different tests that involve aborts
and setting the rrq active flag. The lpfc_set_rrq_active routine is doing
a mempool_alloc under the soft_irq processing level. When the mempool needs
to get a new buffer from the free pool and has to wait for memory to become
free it will check the flags passed in on the alloc and dump the stack if
the thread is running in interrupt context.
Replace the GFP_KERNEL flag with GFP_ATOMIC so that the memory allocation
will not attempt to sleep if there is no mem available.
Link: https://lore.kernel.org/r/20200630215001.70793-7-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When using DUMP on SLI3 to read VPD and Port status data (config region
23), the adapter is overruning the kmalloc'd buffer causing havoc on other
consumers of the allocation pools.
Rework the loops processing the dump data and validate/size memory lengths
before performing bcopy.
Link: https://lore.kernel.org/r/20200630215001.70793-6-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Visual code inspection of the MDS implementation revealed two errors in
the driver:
- The set features Feature Code had an incorrect value
- The routine that classifies command type for cmd completions was missing
the Send Frame definition. Send Frame is used for MDS driver loopback.
Link: https://lore.kernel.org/r/20200630215001.70793-3-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This series consists of the usual driver updates (qla2xxx, ufs, zfcp,
target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host
of other minor updates. There are no major core changes in this
series apart from a refactoring in scsi_lib.c.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXtq5QyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXyGAQCipTWx
7kHKHZBCVTU133bADt3+SstLrAm8PKZEXMnP9wEAzu4QkkW8URxEDRrpu7qk5gbA
9M/KyqvfRtTH7+BSK7M=
=J6aO
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
:This series consists of the usual driver updates (qla2xxx, ufs, zfcp,
target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host
of other minor updates.
There are no major core changes in this series apart from a
refactoring in scsi_lib.c"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits)
scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes
scsi: cxgb3i: Fix some leaks in init_act_open()
scsi: ibmvscsi: Make some functions static
scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim
scsi: ufs: Fix WriteBooster flush during runtime suspend
scsi: ufs: Fix index of attributes query for WriteBooster feature
scsi: ufs: Allow WriteBooster on UFS 2.2 devices
scsi: ufs: Remove unnecessary memset for dev_info
scsi: ufs-qcom: Fix scheduling while atomic issue
scsi: mpt3sas: Fix reply queue count in non RDPQ mode
scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event
scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd()
scsi: vhost: Notify TCM about the maximum sg entries supported per command
scsi: qla2xxx: Remove return value from qla_nvme_ls()
scsi: qla2xxx: Remove an unused function
scsi: iscsi: Register sysfs for iscsi workqueue
scsi: scsi_debug: Parser tables and code interaction
scsi: core: Refactor scsi_mq_setup_tags function
scsi: core: Fix incorrect usage of shost_for_each_device
scsi: qla2xxx: Fix endianness annotations in source files
...
The axchg structure is a structure allocated early in the
lpfc_nvme_unsol_ls_handler() to represent the newly received exchange.
Upon error, the out_fail path in the routine unconditionally frees the
pointer, yet subsequently passes the pointer to the abort routine.
Additionally, the abort routine, lpfc_nvme_unsol_ls_issue_abort(), also
has a failure path that will attempt to delete the pointer on error.
Fix these errors by:
- Removing the unconditional free so that it stays valid if passed
to the abort routine.
- Revise the abort routine to not free the pointer. Instead, return
a success/failure status. Note: if success, the later completion of
the abort frees the structure.
- Back in the unsol_ls_handler() error path, if the abort routine was
skipped (thus no possible reference) or the abort routine returned
error, free the pointer.
Fixes: 3a8070c567 ("lpfc: Refactor NVME LS receive handling")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In preparation for supporting both intiator mode and target mode
receiving NVME LS's, commonize the existing NVME LS request receive
handling found in the base driver and in the nvmet side.
Using the original lpfc_nvmet_unsol_ls_event() and
lpfc_nvme_unsol_ls_buffer() routines as a templates, commonize the
reception of an NVME LS request. The common routine will validate the LS
request, that it was received from a logged-in node, and allocate a
lpfc_async_xchg_ctx that is used to manage the LS request. The role of
the port is then inspected to determine which handler is to receive the
LS - nvme or nvmet. As such, the nvmet handler is tied back in. A handler
is created in nvme and is stubbed out.
Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
To support FC-NVME-2 support (actually FC-NVME (rev 1) with Ammendment 1),
both the nvme (host) and nvmet (controller/target) sides will need to be
able to receive LS requests. Currently, this support is in the nvmet side
only. To prepare for both sides supporting LS receive, rename
lpfc_nvmet_rcv_ctx to lpfc_async_xchg_ctx and commonize the definition.
Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A lot of files in lpfc include nvme headers, building up relationships that
require a file to change for its headers when there is no other change
necessary. It would be better to localize the nvme headers.
There is also no need for separate nvme (initiator) and nvmet (tgt)
header files.
Refactor the inclusion of nvme headers so that all nvme items are
included by lpfc_nvme.h
Merge lpfc_nvmet.h into lpfc_nvme.h so that there is a single header used
by both the nvme and nvmet sides. This prepares for structure sharing
between the two roles. Prep to add shared function prototypes for upcoming
shared routines.
Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Running make C=1 M=drivers/scsi/lpfc triggers sparse warnings
Correct the code generating the following errors:
- Incompatible address space assignment without proper conversion.
- Deference of usespace and per-cpu pointers.
Link: https://lore.kernel.org/r/20200501214310.91713-8-jsmart2021@gmail.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In an audit of lockdep calls in the driver, there are multiple lockdep
checks in successive calling layers. E.g. a routine checks, and then calls
a lower routine that also checks, and so on. Calling sequences result in
many redundant checks.
Refine the code to remove lower-level lockdep checks. Update comments on
the lock, correcting a few places where lock object in comment was
incorrect.
Link: https://lore.kernel.org/r/20200501214310.91713-7-jsmart2021@gmail.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A previous change introduced the atomic use of queue_claimed flag for eq's
and cq's. The code works fine, but the clearing of the queue_claimed flag
is not atomic.
Change queue_claimed = 0 into xchg(&queue_claimed, 0) to be consistent for
change under atomicity.
Link: https://lore.kernel.org/r/20200501214310.91713-3-jsmart2021@gmail.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During code review, identified dss feature that was a prototype only and
was never productized in SLI3. They shouldn't be there and prevents reuse
of the command areas.
Remove any code in the driver to deal with dss, including code to deal with
fips, which is associated with the dss feature.
Link: https://lore.kernel.org/r/20200322181304.37655-12-jsmart2021@gmail.com
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The lpfc_sli4_wq_release() routine iterates for each interim value when
updating the wq consuemr index. This wastes cycles and possibly confuses
things as thevalue itterates (and the modulo logic is being applied).
There's no reason for this. Just set it to the value from the hw.
Link: https://lore.kernel.org/r/20200322181304.37655-7-jsmart2021@gmail.com
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Injecting EEH on a 32GB card is causing kernel oops
The pci error handler is doing an IO flush and the offline code is also
doing an IO flush. When the 1st flush is complete the hdwq is destroyed
(freed), yet the second flush accesses the hdwq and crashes.
Added a check in lpfc_sli4_fush_io_rings to check both the HBA_IOQ_FLUSH
flag and the hdwq pointer to see if it is already set and not already
freed.
Link: https://lore.kernel.org/r/20200322181304.37655-6-jsmart2021@gmail.com
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following lockdep error was reported when unloading the lpfc driver:
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
...
Call Trace:
dump_stack+0x96/0xe0
register_lock_class+0x8b8/0x8c0
? lockdep_hardirqs_on+0x190/0x280
? is_dynamic_key+0x150/0x150
? wait_for_completion_interruptible+0x2a0/0x2a0
? wake_up_q+0xd0/0xd0
__lock_acquire+0xda/0x21a0
? register_lock_class+0x8c0/0x8c0
? synchronize_rcu_expedited+0x500/0x500
? __call_rcu+0x850/0x850
lock_acquire+0xf3/0x1f0
? del_timer_sync+0x5/0xb0
del_timer_sync+0x3c/0xb0
? del_timer_sync+0x5/0xb0
lpfc_pci_remove_one.cold.102+0x8b7/0x935 [lpfc]
...
Unloading the driver resulted in a call to del_timer_sync for the
cpuhp_poll_timer. However the call to setup the timer had never been made,
so the timer structures used by lockdep checking were not initialized.
Unconditionally call setup_timer for the cpuhp_poll_timer during driver
initialization. Calls to start the timer remain "as needed".
Link: https://lore.kernel.org/r/20200322181304.37655-3-jsmart2021@gmail.com
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following kasan bug was called out:
BUG: KASAN: slab-out-of-bounds in lpfc_unreg_login+0x7c/0xc0 [lpfc]
Read of size 2 at addr ffff889fc7c50a22 by task lpfc_worker_3/6676
...
Call Trace:
dump_stack+0x96/0xe0
? lpfc_unreg_login+0x7c/0xc0 [lpfc]
print_address_description.constprop.6+0x1b/0x220
? lpfc_unreg_login+0x7c/0xc0 [lpfc]
? lpfc_unreg_login+0x7c/0xc0 [lpfc]
__kasan_report.cold.9+0x37/0x7c
? lpfc_unreg_login+0x7c/0xc0 [lpfc]
kasan_report+0xe/0x20
lpfc_unreg_login+0x7c/0xc0 [lpfc]
lpfc_sli_def_mbox_cmpl+0x334/0x430 [lpfc]
...
When processing the completion of a "Reg Rpi" login mailbox command in
lpfc_sli_def_mbox_cmpl, a call may be made to lpfc_unreg_login. The vpi is
extracted from the completing mailbox context and passed as an input for
the next. However, the vpi stored in the mailbox command context is an
absolute vpi, which for SLI4 represents both base + offset. When used with
a non-zero base component, (function id > 0) this results in an
out-of-range access beyond the allocated phba->vpi_ids array.
Fix by subtracting the function's base value to get an accurate vpi number.
Link: https://lore.kernel.org/r/20200322181304.37655-2-jsmart2021@gmail.com
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch modifies lpfc to register for Link Integrity events via the use
of an RDF ELS and to perform Link Integrity FPIN logging.
Specifically, the driver was modified to:
- Format and issue the RDF ELS immediately following SCR registration.
This registers the ability of the driver to receive FPIN ELS.
- Adds decoding of the FPIN els into the received descriptors, with
logging of the Link Integrity event information. After decoding, the ELS
is delivered to the scsi fc transport to be delivered to any user-space
applications.
- To aid in logging, simple helpers were added to create enum to name
string lookup functions that utilize the initialization helpers from the
fc_els.h header.
- Note: base header definitions for the ELS's don't populate the
descriptor payloads. As such, lpfc creates it's own version of the
structures, using the base definitions (mostly headers) and additionally
declaring the descriptors that will complete the population of the ELS.
Link: https://lore.kernel.org/r/20200210173155.547-3-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update copyrights to 2020 for files modified in the 12.6.0.4 patch set.
Link: https://lore.kernel.org/r/20200128002312.16346-13-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current code does some odd +1 over maximum xri count checks and
requires that the lun_queue_count can't be bigger than maximum xri count
divided by 8. These items are bogus.
Clean the code up to cap lun_queue_count to maximum xri count.
Link: https://lore.kernel.org/r/20200128002312.16346-10-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is occasionally seeing the following SLI Port error, requiring
reset and reinit:
Port Status Event: ... error 1=0x52004a01, error 2=0x218
The failure means an RQ timeout. That is, the adapter had received
asynchronous receive frames, ran out of buffer slots to place the frames,
and the driver did not replenish the buffer slots before a timeout
occurred. The driver should not be so slow in replenishing buffers that a
timeout can occur.
When the driver received all the frames of a sequence, it allocates an IOCB
to put the frames in. In a situation where there was no IOCB available for
the frame of a sequence, the RQ buffer corresponding to the first frame of
the sequence was not returned to the FW. Eventually, with enough traffic
encountering the situation, the timeout occurred.
Fix by releasing the buffer back to firmware whenever there is no IOCB for
the first frame.
[mkp: typo]
Link: https://lore.kernel.org/r/20200128002312.16346-2-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This series concludes the work I did for linux-5.5 on the compat_ioctl()
cleanup, killing off fs/compat_ioctl.c and block/compat_ioctl.c by moving
everything into drivers.
Overall this would be a reduction both in complexity and line count, but
as I'm also adding documentation the overall number of lines increases
in the end.
My plan was originally to keep the SCSI and block parts separate.
This did not work easily because of interdependencies: I cannot
do the final SCSI cleanup in a good way without first addressing the
CDROM ioctls, so this is one series that I hope could be merged through
either the block or the scsi git trees, or possibly both if you can
pull in the same branch.
The series comes in these steps:
1. clean up the sg v3 interface as suggested by Linus. I have
talked about this with Doug Gilbert as well, and he would
rebase his sg v4 patches on top of "compat: scsi: sg: fix v3
compat read/write interface"
2. Actually moving handlers out of block/compat_ioctl.c and
block/scsi_ioctl.c into drivers, mixed in with cleanup
patches
3. Document how to do this right. I keep getting asked about this,
and it helps to point to some documentation file.
The branch is based on another one that fixes a couple of bugs found
during the creation of this series.
Changes since v3:
https://lore.kernel.org/lkml/20200102145552.1853992-1-arnd@arndb.de/
- Move sr_compat_ioctl fixup to correct patch (Ben Hutchings)
- Add Reviewed-by tags
Changes since v2:
https://lore.kernel.org/lkml/20191217221708.3730997-1-arnd@arndb.de/
- Rebase to v5.5-rc4, which contains the earlier bugfixes
- Fix sr_block_compat_ioctl() error handling bug found by
Ben Hutchings
- Fix idecd_locked_compat_ioctl() compat_ptr() bug
- Don't try to handle HDIO_DRIVE_TASKFILE in drivers/ide
- More documentation improvements
Changes since v1:
https://lore.kernel.org/lkml/20191211204306.1207817-1-arnd@arndb.de/
- move out the bugfixes into a branch for itself
- clean up scsi sg driver further as suggested by Christoph Hellwig
- avoid some ifdefs by moving compat_ptr() out of asm/compat.h
- split out the blkdev_compat_ptr_ioctl function; bug spotted by
Ben Hutchings
- Improve formatting of documentation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJeDv8JAAoJEGCrR//JCVInh/oP/2BHdQvWONxwXXg2BLH7OJHm
4PFoblxjNH/pwHm2PKh2uj8vUSgTHqID7NJChVgKZaZEJEJR7h26Sx60p+yTAepR
/ysQiGameacJu2ZzKPYc4/S33Yu8cogQ5+DSz7mI9T5Yw0HSAE0JZ5xd9KIZ+/u8
6k65ujd9kCxCgmtXrpx+7JFF0xb+urXKCvjdt2EfQ1ZmuMX5rDG/bTNg5JJ50shW
vb7Z8hCpfW61ux8M/dgIh4WvUf0SA7FOy8WF1Km9gNhKGj41Arb2lmX1Jb4jDgjl
DGsXQupyMVwigp5N37H3o1MamX/C8S49c16/zJQcJj64xX7WdxhE5kR8JIf+36Tf
2l4wpaqVukXPvXkdv76Y472fKoOMZATF6kCoEPG3gXW9oxXDs5d2ofALfO3uNfLB
PC4hzorw6bBlt67qAqERft2cxMMi9xSYfYZ8jD+eSF8WLL7xIcEazZqq8dKz7O00
Qqx6+jzejT18av7cPfLjnupZg+mEcxDbPeuCgjrbhR8lcUI4DBu379RiTaQanvyR
W00zwqCZWYnNJoha8u3AKsRcfL8eziF+/K9k+lCuhXeQBI4ipFJ03wAD4TWCigCS
N7AikOdLzGVxE+2IfeCXPDKpdT6hFnjulnyDEgc/7jwHzcVF3MQBHwXiKhWHEUvT
/AzAtKAiivp+uaMgAbzd
=cddL
-----END PGP SIGNATURE-----
Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue
Pull compat_ioctl cleanup from Arnd. Here's his description:
This series concludes the work I did for linux-5.5 on the compat_ioctl()
cleanup, killing off fs/compat_ioctl.c and block/compat_ioctl.c by moving
everything into drivers.
Overall this would be a reduction both in complexity and line count, but
as I'm also adding documentation the overall number of lines increases
in the end.
My plan was originally to keep the SCSI and block parts separate.
This did not work easily because of interdependencies: I cannot
do the final SCSI cleanup in a good way without first addressing the
CDROM ioctls, so this is one series that I hope could be merged through
either the block or the scsi git trees, or possibly both if you can
pull in the same branch.
The series comes in these steps:
1. clean up the sg v3 interface as suggested by Linus. I have
talked about this with Doug Gilbert as well, and he would
rebase his sg v4 patches on top of "compat: scsi: sg: fix v3
compat read/write interface"
2. Actually moving handlers out of block/compat_ioctl.c and
block/scsi_ioctl.c into drivers, mixed in with cleanup
patches
3. Document how to do this right. I keep getting asked about this,
and it helps to point to some documentation file.
The branch is based on another one that fixes a couple of bugs found
during the creation of this series.
Changes since v3:
https://lore.kernel.org/lkml/20200102145552.1853992-1-arnd@arndb.de/
- Move sr_compat_ioctl fixup to correct patch (Ben Hutchings)
- Add Reviewed-by tags
Changes since v2:
https://lore.kernel.org/lkml/20191217221708.3730997-1-arnd@arndb.de/
- Rebase to v5.5-rc4, which contains the earlier bugfixes
- Fix sr_block_compat_ioctl() error handling bug found by
Ben Hutchings
- Fix idecd_locked_compat_ioctl() compat_ptr() bug
- Don't try to handle HDIO_DRIVE_TASKFILE in drivers/ide
- More documentation improvements
Changes since v1:
https://lore.kernel.org/lkml/20191211204306.1207817-1-arnd@arndb.de/
- move out the bugfixes into a branch for itself
- clean up scsi sg driver further as suggested by Christoph Hellwig
- avoid some ifdefs by moving compat_ptr() out of asm/compat.h
- split out the blkdev_compat_ptr_ioctl function; bug spotted by
Ben Hutchings
- Improve formatting of documentation
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When running Cisco-MDS diagnostics which perform driver-level frame loop
back, the switch is reporting errors. Diagnostic has a limit on latency
that is not being met by the driver.
The requirement of Latency frames is that they should be responded back by
the host with a maximum delay of few hundreds of microseconds. If the
switch doesn't get response frames within this time frame, it fails the
test.
Test is failing as the lpfc-wq workqueue was overwhelmed by the packet rate
and in some cases, the work element yielded to other kernel elements.
To resolve, reduce the outstanding load allowed by the adapter. This
ensures the driver spends a reasonable amount of time doing loopback and
can do so such that latency values can be met. Load is managed by reducing
the number of receive buffers posted such that the link can be
backpressured to reduce load.
Link: https://lore.kernel.org/r/20191218235808.31922-9-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the WriteObject mailbox response has change_status set to is 0x2
(Firmware Reset) or 0x04 (Port Migration Reset), the CSF field should also
be checked to see if a fw reset is sufficient to enable all new features in
the updated firmware image. If not, a fw reset would start the new
firmware, but with a feature level equal to existing firmware. To enable
the new features, a chip reset/pci slot reset would be required.
Check the CSF bit when change_status is 0x2 or 0x4 to know whether to
perform a pci bus reset or fw reset.
Link: https://lore.kernel.org/r/20191218235808.31922-4-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are spelling mistakes of asynchronous in a lpfc_printf_log message
and comments. Fix these.
Link: https://lore.kernel.org/r/20191218084301.627555-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly update of the usual drivers: aacraid, ufs, zfcp,
NCR5380, lpfc, qla2xxx, smartpqi, hisi_sas, target, mpt3sas, pm80xx
plus a whole load of minor updates and fixes. The two major core
changes are Al Viro's reworking of sg's handling of copy to/from user,
Ming Lei's removal of the host busy counter to avoid contention in the
multiqueue case and Damien Le Moal's fixing of residual tracking
across error handling.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXeKvHCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQJMAQDAjlAi
SNfbyndMqyf+rZGWufDI+43Up1VvW9GeWJHeDwEAxfO5XZsCks2uT8UxXhpEp9L7
HkiUww3zbcgl0FWFkUM=
=cdVU
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: aacraid, ufs, zfcp,
NCR5380, lpfc, qla2xxx, smartpqi, hisi_sas, target, mpt3sas, pm80xx
plus a whole load of minor updates and fixes.
The major core changes are Al Viro's reworking of sg's handling of
copy to/from user, Ming Lei's removal of the host busy counter to
avoid contention in the multiqueue case and Damien Le Moal's fixing of
residual tracking across error handling"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (251 commits)
scsi: bnx2fc: timeout calculation invalid for bnx2fc_eh_abort()
scsi: target: core: Fix a pr_debug() argument
scsi: iscsi: Don't send data to unbound connection
scsi: target: iscsi: Wait for all commands to finish before freeing a session
scsi: target: core: Release SPC-2 reservations when closing a session
scsi: target: core: Document target_cmd_size_check()
scsi: bnx2i: fix potential use after free
Revert "scsi: qla2xxx: Fix memory leak when sending I/O fails"
scsi: NCR5380: Add disconnect_mask module parameter
scsi: NCR5380: Unconditionally clear ICR after do_abort()
scsi: NCR5380: Call scsi_set_resid() on command completion
scsi: scsi_debug: num_tgts must be >= 0
scsi: lpfc: use hdwq assigned cpu for allocation
scsi: arcmsr: fix indentation issues
scsi: qla4xxx: fix double free bug
scsi: pm80xx: Modified the logic to collect fatal dump
scsi: pm80xx: Tie the interrupt name to the module instance
scsi: pm80xx: Controller fatal error through sysfs
scsi: pm80xx: Do not request 12G sas speeds
scsi: pm80xx: Cleanup command when a reset times out
...
Looking at the recent conversion from smp_processor_id() to
raw_smp_processor_id(), realized that the allocation should be based on the
cpu the hdwq is bound to, not the executing cpu.
Revise to pull cpu number from the hdwq
Fixes: 765ab6cdac ("scsi: lpfc: Fix a kernel warning triggered by lpfc_get_sgl_per_hdwq()")
Link: https://lore.kernel.org/r/20191116003847.6141-1-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Compilation can fail due to having an inline function reference where the
function body is not present.
Fix by removing the inline tag.
Fixes: 93a4d6f401 ("scsi: lpfc: Add registration for CPU Offline/Online events")
Link: https://lore.kernel.org/r/20191111230401.12958-4-jsmart2021@gmail.com
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix the following kernel bug report:
BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/954
Fixes: d79c9e9d4b ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.")
Link: https://lore.kernel.org/r/20191107052158.25788-2-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some adapters support the ability to hold multiple adapter dumps on the
adapter flash. Some adapters default to enabling this feature while others
default to single-dump.
Make support uniform by enabling dual dump by default.
Link: https://lore.kernel.org/r/20191105005708.7399-11-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The recent affinitization didn't address cpu offlining/onlining. If an
interrupt vector is shared and the low order cpu owning the vector is
offlined, as interrupts are managed, the vector is taken offline. This
causes the other CPUs sharing the vector will hang as they can't get io
completions.
Correct by registering callbacks with the system for Offline/Online
events. When a cpu is taken offline, its eq, which is tied to an interrupt
vector is found. If the cpu is the "owner" of the vector and if the
eq/vector is shared by other CPUs, the eq is placed into a polled mode.
Additionally, code paths that perform io submission on the "sharing CPUs"
will check the eq state and poll for completion after submission of new io
to a wq that uses the eq.
Similarly, when a cpu comes back online and owns an offlined vector, the eq
is taken out of polled mode and rearmed to start driving interrupts for eq.
Link: https://lore.kernel.org/r/20191105005708.7399-9-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the driver receives a login that is later then LOGO'd by the remote port
(aka ndlp), the driver, upon the completion of the LOGO ACC transmission,
will logout the node and unregister the rpi that is being used for the
node. As part of the unreg, the node's rpi value is replaced by the
LPFC_RPI_ALLOC_ERROR value. If the port is subsequently offlined, the
offline walks the nodes and ensures they are logged out, which possibly
entails unreg'ing their rpi values. This path does not validate the node's
rpi value, thus doesn't detect that it has been unreg'd already. The
replaced rpi value is then used when accessing the rpi bitmask array which
tracks active rpi values. As the LPFC_RPI_ALLOC_ERROR value is not a valid
index for the bitmask, it may fault the system.
Revise the rpi release code to detect when the rpi value is the replaced
RPI_ALLOC_ERROR value and ignore further release steps.
Link: https://lore.kernel.org/r/20191105005708.7399-2-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, the FW logging facility is a load/boot time parameter which
requires the driver to be unloaded/reloaded or the system rebooted in order
to change its configuration.
Convert the logging facility to allow dynamic enablement and configuration.
Specifically:
- Convert the feature so that it can be enabled dynamically via an
attribute. Additionally, the size of the buffer can be configured
dynamically.
- Add locks around states that now may be changing.
- Tie the feature into debugfs so that the logs can be read at any time.
Link: https://lore.kernel.org/r/20191018211832.7917-12-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The existing "auto eq delay" mechanism was sometimes skipping over an EQ,
not ramping the coalescing down under light load fast enough, and in other
cases never kicked in as cpu sharing by multiple vectors didn't quite add
up right.
Tweak the interrupt mechanism such that:
- Add a flag to the EQ to force checking for colaescing values when being
serviced in the interrupt handler. The flag will be set by any CQ bound
to the EQ whenever the number of CQ elements process in a single scan
meets or exceeds the hardware queue notify level. E.g. there's a
significant number of completions happening.
- In the heartbeat work item that checks coalescing:
- Replace the structure that was counting the number of EQs that
interrupted on a single cpu with a new structure that looks at the EQ
to see whether EQ currently has a coalescing value (thus it should be
re-evaluate) or was marked by the new flag indicating heavy
completions.
- When a cpu, which may be servicing multiple vectors, had at least 1 EQ
that should be checked, a new coalescing delay is calculated based on
the number of interrupts that occurred on the cpu.
- The new coalescing value is then applied to the EQs that had
interrupted on the cpu.
Link: https://lore.kernel.org/r/20191018211832.7917-11-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In cases where I/O may be aborted, such as driver unload or link bounces,
the system will crash based on a bad ndlp pointer.
Example:
RIP: 0010:lpfc_sli4_abts_err_handler+0x15/0x140 [lpfc]
...
lpfc_sli4_io_xri_aborted+0x20d/0x270 [lpfc]
lpfc_sli4_sp_handle_abort_xri_wcqe.isra.54+0x84/0x170 [lpfc]
lpfc_sli4_fp_handle_cqe+0xc2/0x480 [lpfc]
__lpfc_sli4_process_cq+0xc6/0x230 [lpfc]
__lpfc_sli4_hba_process_cq+0x29/0xc0 [lpfc]
process_one_work+0x14c/0x390
Crash was caused by a bad ndlp address passed to I/O indicated by the XRI
aborted CQE. The address was not NULL so the routine deferenced the ndlp
ptr. The bad ndlp also caused the lpfc_sli4_io_xri_aborted to call an
erroneous io handler. Root cause for the bad ndlp was an lpfc_ncmd that
was aborted, put on the abort_io list, completed, taken off the abort_io
list, sent to lpfc_release_nvme_buf where it was put back on the abort_io
list because the lpfc_ncmd->flags setting LPFC_SBUF_XBUSY was not cleared
on the final completion.
Rework the exchange busy handling to ensure the flags are properly set for
both scsi and nvme.
Fixes: c490850a09 ("scsi: lpfc: Adapt partitioned XRI lists to efficient sharing")
Cc: <stable@vger.kernel.org> # v5.1+
Link: https://lore.kernel.org/r/20191018211832.7917-6-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>