linux/drivers/s390/scsi/zfcp_fsf.c

2537 lines
71 KiB
C
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 14:07:57 +00:00
// SPDX-License-Identifier: GPL-2.0
/*
* zfcp device driver
*
* Implementation of FSF commands.
*
* Copyright IBM Corp. 2002, 2018
*/
#define KMSG_COMPONENT "zfcp"
#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
#include <linux/blktrace_api.h>
#include <linux/jiffies.h>
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
#include <linux/types.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 08:04:11 +00:00
#include <linux/slab.h>
#include <scsi/fc/fc_els.h>
#include "zfcp_ext.h"
#include "zfcp_fc.h"
#include "zfcp_dbf.h"
#include "zfcp_qdio.h"
#include "zfcp_reqlist.h"
#include "zfcp_diag.h"
/* timeout for FSF requests sent during scsi_eh: abort or FCP TMF */
#define ZFCP_FSF_SCSI_ER_TIMEOUT (10*HZ)
/* timeout for: exchange config/port data outside ERP, or open/close WKA port */
#define ZFCP_FSF_REQUEST_TIMEOUT (60*HZ)
struct kmem_cache *zfcp_fsf_qtcb_cache;
scsi: zfcp: fix reaction on bit error threshold notification On excessive bit errors for the FCP channel ingress fibre path, the channel notifies us. Previously, we only emitted a kernel message and a trace record. Since performance can become suboptimal with I/O timeouts due to bit errors, we now stop using an FCP device by default on channel notification so multipath on top can timely failover to other paths. A new module parameter zfcp.ber_stop can be used to get zfcp old behavior. User explanation of new kernel message: * Description: * The FCP channel reported that its bit error threshold has been exceeded. * These errors might result from a problem with the physical components * of the local fibre link into the FCP channel. * The problem might be damage or malfunction of the cable or * cable connection between the FCP channel and * the adjacent fabric switch port or the point-to-point peer. * Find details about the errors in the HBA trace for the FCP device. * The zfcp device driver closed down the FCP device * to limit the performance impact from possible I/O command timeouts. * User action: * Check for problems on the local fibre link, ensure that fibre optics are * clean and functional, and all cables are properly plugged. * After the repair action, you can manually recover the FCP device by * writing "0" into its "failed" sysfs attribute. * If recovery through sysfs is not possible, set the CHPID of the device * offline and back online on the service element. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: <stable@vger.kernel.org> #2.6.30+ Link: https://lore.kernel.org/r/20191001104949.42810-1-maier@linux.ibm.com Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-01 10:49:49 +00:00
static bool ber_stop = true;
module_param(ber_stop, bool, 0600);
MODULE_PARM_DESC(ber_stop,
"Shuts down FCP devices for FCP channels that report a bit-error count in excess of its threshold (default on)");
static void zfcp_fsf_request_timeout_handler(struct timer_list *t)
{
struct zfcp_fsf_req *fsf_req = from_timer(fsf_req, t, timer);
struct zfcp_adapter *adapter = fsf_req->adapter;
zfcp_qdio_siosl(adapter);
zfcp_erp_adapter_reopen(adapter, ZFCP_STATUS_COMMON_ERP_FAILED,
"fsrth_1");
}
static void zfcp_fsf_start_timer(struct zfcp_fsf_req *fsf_req,
unsigned long timeout)
{
fsf_req->timer.function = zfcp_fsf_request_timeout_handler;
fsf_req->timer.expires = jiffies + timeout;
add_timer(&fsf_req->timer);
}
static void zfcp_fsf_start_erp_timer(struct zfcp_fsf_req *fsf_req)
{
BUG_ON(!fsf_req->erp_action);
fsf_req->timer.function = zfcp_erp_timeout_handler;
fsf_req->timer.expires = jiffies + 30 * HZ;
add_timer(&fsf_req->timer);
}
/* association between FSF command and FSF QTCB type */
static u32 fsf_qtcb_type[] = {
[FSF_QTCB_FCP_CMND] = FSF_IO_COMMAND,
[FSF_QTCB_ABORT_FCP_CMND] = FSF_SUPPORT_COMMAND,
[FSF_QTCB_OPEN_PORT_WITH_DID] = FSF_SUPPORT_COMMAND,
[FSF_QTCB_OPEN_LUN] = FSF_SUPPORT_COMMAND,
[FSF_QTCB_CLOSE_LUN] = FSF_SUPPORT_COMMAND,
[FSF_QTCB_CLOSE_PORT] = FSF_SUPPORT_COMMAND,
[FSF_QTCB_CLOSE_PHYSICAL_PORT] = FSF_SUPPORT_COMMAND,
[FSF_QTCB_SEND_ELS] = FSF_SUPPORT_COMMAND,
[FSF_QTCB_SEND_GENERIC] = FSF_SUPPORT_COMMAND,
[FSF_QTCB_EXCHANGE_CONFIG_DATA] = FSF_CONFIG_COMMAND,
[FSF_QTCB_EXCHANGE_PORT_DATA] = FSF_PORT_COMMAND,
[FSF_QTCB_DOWNLOAD_CONTROL_FILE] = FSF_SUPPORT_COMMAND,
[FSF_QTCB_UPLOAD_CONTROL_FILE] = FSF_SUPPORT_COMMAND
};
static void zfcp_fsf_class_not_supp(struct zfcp_fsf_req *req)
{
dev_err(&req->adapter->ccw_device->dev, "FCP device not "
"operational because of an unsupported FC class\n");
zfcp_erp_adapter_shutdown(req->adapter, 0, "fscns_1");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
}
/**
* zfcp_fsf_req_free - free memory used by fsf request
* @req: pointer to struct zfcp_fsf_req
*/
void zfcp_fsf_req_free(struct zfcp_fsf_req *req)
{
if (likely(req->pool)) {
scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header Status read buffers (SRBs, unsolicited notifications) never use a QTCB [zfcp_fsf_req_create()]. zfcp_fsf_req_send() already uses this to distinguish SRBs from other FSF request types. We can re-use this method in zfcp_fsf_req_complete(). Introduce a helper function to make the check for req->qtcb less magic. SRBs always are FSF_QTCB_UNSOLICITED_STATUS, so we can hard-code this for the two trace functions dealing with SRBs. All other FSF request types have a QTCB and we can get the fsf_command from there. zfcp_dbf_hba_fsf_response() and thus zfcp_dbf_hba_fsf_res() are only called for non-SRB requests so it's safe to dereference the QTCB [zfcp_fsf_req_complete() returns early on SRB, else calls zfcp_fsf_protstatus_eval() which calls zfcp_dbf_hba_fsf_response()]. In zfcp_scsi_forget_cmnd() we guard the QTCB dereference with a preceding NULL check and rely on boolean shortcut evaluation. As a side effect, this causes an alignment hole which we can close in a later patch after having cleaned up all fields of struct zfcp_fsf_req. Before: $ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko ... u32 status; /* 136 4 */ u32 fsf_command; /* 140 4 */ struct fsf_qtcb * qtcb; /* 144 8 */ ... After: $ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko ... u32 status; /* 136 4 */ /* XXX 4 bytes hole, try to pack */ struct fsf_qtcb * qtcb; /* 144 8 */ ... Signed-off-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-08 14:44:45 +00:00
if (likely(!zfcp_fsf_req_is_status_read_buffer(req)))
mempool_free(req->qtcb, req->adapter->pool.qtcb_pool);
mempool_free(req, req->pool);
return;
}
scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header Status read buffers (SRBs, unsolicited notifications) never use a QTCB [zfcp_fsf_req_create()]. zfcp_fsf_req_send() already uses this to distinguish SRBs from other FSF request types. We can re-use this method in zfcp_fsf_req_complete(). Introduce a helper function to make the check for req->qtcb less magic. SRBs always are FSF_QTCB_UNSOLICITED_STATUS, so we can hard-code this for the two trace functions dealing with SRBs. All other FSF request types have a QTCB and we can get the fsf_command from there. zfcp_dbf_hba_fsf_response() and thus zfcp_dbf_hba_fsf_res() are only called for non-SRB requests so it's safe to dereference the QTCB [zfcp_fsf_req_complete() returns early on SRB, else calls zfcp_fsf_protstatus_eval() which calls zfcp_dbf_hba_fsf_response()]. In zfcp_scsi_forget_cmnd() we guard the QTCB dereference with a preceding NULL check and rely on boolean shortcut evaluation. As a side effect, this causes an alignment hole which we can close in a later patch after having cleaned up all fields of struct zfcp_fsf_req. Before: $ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko ... u32 status; /* 136 4 */ u32 fsf_command; /* 140 4 */ struct fsf_qtcb * qtcb; /* 144 8 */ ... After: $ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko ... u32 status; /* 136 4 */ /* XXX 4 bytes hole, try to pack */ struct fsf_qtcb * qtcb; /* 144 8 */ ... Signed-off-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-08 14:44:45 +00:00
if (likely(!zfcp_fsf_req_is_status_read_buffer(req)))
kmem_cache_free(zfcp_fsf_qtcb_cache, req->qtcb);
kfree(req);
}
static void zfcp_fsf_status_read_port_closed(struct zfcp_fsf_req *req)
{
unsigned long flags;
struct fsf_status_read_buffer *sr_buf = req->data;
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_port *port;
int d_id = ntoh24(sr_buf->d_id);
read_lock_irqsave(&adapter->port_list_lock, flags);
list_for_each_entry(port, &adapter->port_list, list)
if (port->d_id == d_id) {
zfcp_erp_port_reopen(port, 0, "fssrpc1");
break;
}
read_unlock_irqrestore(&adapter->port_list_lock, flags);
}
static void zfcp_fsf_link_down_info_eval(struct zfcp_fsf_req *req,
struct fsf_link_down_info *link_down)
{
struct zfcp_adapter *adapter = req->adapter;
if (atomic_read(&adapter->status) & ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED)
return;
atomic_or(ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED, &adapter->status);
zfcp_scsi_schedule_rports_block(adapter);
if (!link_down)
goto out;
switch (link_down->error_code) {
case FSF_PSQ_LINK_NO_LIGHT:
dev_warn(&req->adapter->ccw_device->dev,
"There is no light signal from the local "
"fibre channel cable\n");
break;
case FSF_PSQ_LINK_WRAP_PLUG:
dev_warn(&req->adapter->ccw_device->dev,
"There is a wrap plug instead of a fibre "
"channel cable\n");
break;
case FSF_PSQ_LINK_NO_FCP:
dev_warn(&req->adapter->ccw_device->dev,
"The adjacent fibre channel node does not "
"support FCP\n");
break;
case FSF_PSQ_LINK_FIRMWARE_UPDATE:
dev_warn(&req->adapter->ccw_device->dev,
"The FCP device is suspended because of a "
"firmware update\n");
break;
case FSF_PSQ_LINK_INVALID_WWPN:
dev_warn(&req->adapter->ccw_device->dev,
"The FCP device detected a WWPN that is "
"duplicate or not valid\n");
break;
case FSF_PSQ_LINK_NO_NPIV_SUPPORT:
dev_warn(&req->adapter->ccw_device->dev,
"The fibre channel fabric does not support NPIV\n");
break;
case FSF_PSQ_LINK_NO_FCP_RESOURCES:
dev_warn(&req->adapter->ccw_device->dev,
"The FCP adapter cannot support more NPIV ports\n");
break;
case FSF_PSQ_LINK_NO_FABRIC_RESOURCES:
dev_warn(&req->adapter->ccw_device->dev,
"The adjacent switch cannot support "
"more NPIV ports\n");
break;
case FSF_PSQ_LINK_FABRIC_LOGIN_UNABLE:
dev_warn(&req->adapter->ccw_device->dev,
"The FCP adapter could not log in to the "
"fibre channel fabric\n");
break;
case FSF_PSQ_LINK_WWPN_ASSIGNMENT_CORRUPTED:
dev_warn(&req->adapter->ccw_device->dev,
"The WWPN assignment file on the FCP adapter "
"has been damaged\n");
break;
case FSF_PSQ_LINK_MODE_TABLE_CURRUPTED:
dev_warn(&req->adapter->ccw_device->dev,
"The mode table on the FCP adapter "
"has been damaged\n");
break;
case FSF_PSQ_LINK_NO_WWPN_ASSIGNMENT:
dev_warn(&req->adapter->ccw_device->dev,
"All NPIV ports on the FCP adapter have "
"been assigned\n");
break;
default:
dev_warn(&req->adapter->ccw_device->dev,
"The link between the FCP adapter and "
"the FC fabric is down\n");
}
out:
zfcp_erp_set_adapter_status(adapter, ZFCP_STATUS_COMMON_ERP_FAILED);
}
static void zfcp_fsf_status_read_link_down(struct zfcp_fsf_req *req)
{
struct fsf_status_read_buffer *sr_buf = req->data;
struct fsf_link_down_info *ldi =
(struct fsf_link_down_info *) &sr_buf->payload;
switch (sr_buf->status_subtype) {
case FSF_STATUS_READ_SUB_NO_PHYSICAL_LINK:
case FSF_STATUS_READ_SUB_FDISC_FAILED:
zfcp_fsf_link_down_info_eval(req, ldi);
break;
case FSF_STATUS_READ_SUB_FIRMWARE_UPDATE:
zfcp_fsf_link_down_info_eval(req, NULL);
}
}
static void zfcp_fsf_status_read_handler(struct zfcp_fsf_req *req)
{
struct zfcp_adapter *adapter = req->adapter;
struct fsf_status_read_buffer *sr_buf = req->data;
if (req->status & ZFCP_STATUS_FSFREQ_DISMISSED) {
zfcp_dbf_hba_fsf_uss("fssrh_1", req);
mempool_free(virt_to_page(sr_buf), adapter->pool.sr_data);
zfcp_fsf_req_free(req);
return;
}
zfcp_dbf_hba_fsf_uss("fssrh_4", req);
switch (sr_buf->status_type) {
case FSF_STATUS_READ_PORT_CLOSED:
zfcp_fsf_status_read_port_closed(req);
break;
case FSF_STATUS_READ_INCOMING_ELS:
zfcp_fc_incoming_els(req);
break;
case FSF_STATUS_READ_SENSE_DATA_AVAIL:
break;
case FSF_STATUS_READ_BIT_ERROR_THRESHOLD:
zfcp_dbf_hba_bit_err("fssrh_3", req);
scsi: zfcp: fix reaction on bit error threshold notification On excessive bit errors for the FCP channel ingress fibre path, the channel notifies us. Previously, we only emitted a kernel message and a trace record. Since performance can become suboptimal with I/O timeouts due to bit errors, we now stop using an FCP device by default on channel notification so multipath on top can timely failover to other paths. A new module parameter zfcp.ber_stop can be used to get zfcp old behavior. User explanation of new kernel message: * Description: * The FCP channel reported that its bit error threshold has been exceeded. * These errors might result from a problem with the physical components * of the local fibre link into the FCP channel. * The problem might be damage or malfunction of the cable or * cable connection between the FCP channel and * the adjacent fabric switch port or the point-to-point peer. * Find details about the errors in the HBA trace for the FCP device. * The zfcp device driver closed down the FCP device * to limit the performance impact from possible I/O command timeouts. * User action: * Check for problems on the local fibre link, ensure that fibre optics are * clean and functional, and all cables are properly plugged. * After the repair action, you can manually recover the FCP device by * writing "0" into its "failed" sysfs attribute. * If recovery through sysfs is not possible, set the CHPID of the device * offline and back online on the service element. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: <stable@vger.kernel.org> #2.6.30+ Link: https://lore.kernel.org/r/20191001104949.42810-1-maier@linux.ibm.com Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-01 10:49:49 +00:00
if (ber_stop) {
dev_warn(&adapter->ccw_device->dev,
"All paths over this FCP device are disused because of excessive bit errors\n");
zfcp_erp_adapter_shutdown(adapter, 0, "fssrh_b");
} else {
dev_warn(&adapter->ccw_device->dev,
"The error threshold for checksum statistics has been exceeded\n");
}
break;
case FSF_STATUS_READ_LINK_DOWN:
zfcp_fsf_status_read_link_down(req);
zfcp_fc_enqueue_event(adapter, FCH_EVT_LINKDOWN, 0);
break;
case FSF_STATUS_READ_LINK_UP:
dev_info(&adapter->ccw_device->dev,
"The local link has been restored\n");
/* All ports should be marked as ready to run again */
zfcp_erp_set_adapter_status(adapter,
ZFCP_STATUS_COMMON_RUNNING);
zfcp_erp_adapter_reopen(adapter,
ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED |
ZFCP_STATUS_COMMON_ERP_FAILED,
"fssrh_2");
zfcp_fc_enqueue_event(adapter, FCH_EVT_LINKUP, 0);
break;
case FSF_STATUS_READ_NOTIFICATION_LOST:
if (sr_buf->status_subtype & FSF_STATUS_READ_SUB_INCOMING_ELS)
zfcp_fc_conditional_port_scan(adapter);
break;
case FSF_STATUS_READ_FEATURE_UPDATE_ALERT:
adapter->adapter_features = sr_buf->payload.word[0];
break;
}
mempool_free(virt_to_page(sr_buf), adapter->pool.sr_data);
zfcp_fsf_req_free(req);
atomic_inc(&adapter->stat_miss);
queue_work(adapter->work_queue, &adapter->stat_work);
}
static void zfcp_fsf_fsfstatus_qual_eval(struct zfcp_fsf_req *req)
{
switch (req->qtcb->header.fsf_status_qual.word[0]) {
case FSF_SQ_FCP_RSP_AVAILABLE:
case FSF_SQ_INVOKE_LINK_TEST_PROCEDURE:
case FSF_SQ_NO_RETRY_POSSIBLE:
case FSF_SQ_ULP_DEPENDENT_ERP_REQUIRED:
return;
case FSF_SQ_COMMAND_ABORTED:
break;
case FSF_SQ_NO_RECOM:
dev_err(&req->adapter->ccw_device->dev,
"The FCP adapter reported a problem "
"that cannot be recovered\n");
zfcp_qdio_siosl(req->adapter);
zfcp_erp_adapter_shutdown(req->adapter, 0, "fsfsqe1");
break;
}
/* all non-return stats set FSFREQ_ERROR*/
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
}
static void zfcp_fsf_fsfstatus_eval(struct zfcp_fsf_req *req)
{
if (unlikely(req->status & ZFCP_STATUS_FSFREQ_ERROR))
return;
switch (req->qtcb->header.fsf_status) {
case FSF_UNKNOWN_COMMAND:
dev_err(&req->adapter->ccw_device->dev,
"The FCP adapter does not recognize the command 0x%x\n",
req->qtcb->header.fsf_command);
zfcp_erp_adapter_shutdown(req->adapter, 0, "fsfse_1");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_ADAPTER_STATUS_AVAILABLE:
zfcp_fsf_fsfstatus_qual_eval(req);
break;
}
}
static void zfcp_fsf_protstatus_eval(struct zfcp_fsf_req *req)
{
struct zfcp_adapter *adapter = req->adapter;
struct fsf_qtcb *qtcb = req->qtcb;
union fsf_prot_status_qual *psq = &qtcb->prefix.prot_status_qual;
zfcp_dbf_hba_fsf_response(req);
if (req->status & ZFCP_STATUS_FSFREQ_DISMISSED) {
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
return;
}
switch (qtcb->prefix.prot_status) {
case FSF_PROT_GOOD:
case FSF_PROT_FSF_STATUS_PRESENTED:
return;
case FSF_PROT_QTCB_VERSION_ERROR:
dev_err(&adapter->ccw_device->dev,
"QTCB version 0x%x not supported by FCP adapter "
"(0x%x to 0x%x)\n", FSF_QTCB_CURRENT_VERSION,
psq->word[0], psq->word[1]);
zfcp_erp_adapter_shutdown(adapter, 0, "fspse_1");
break;
case FSF_PROT_ERROR_STATE:
case FSF_PROT_SEQ_NUMB_ERROR:
zfcp_erp_adapter_reopen(adapter, 0, "fspse_2");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_PROT_UNSUPP_QTCB_TYPE:
dev_err(&adapter->ccw_device->dev,
"The QTCB type is not supported by the FCP adapter\n");
zfcp_erp_adapter_shutdown(adapter, 0, "fspse_3");
break;
case FSF_PROT_HOST_CONNECTION_INITIALIZING:
atomic_or(ZFCP_STATUS_ADAPTER_HOST_CON_INIT,
&adapter->status);
break;
case FSF_PROT_DUPLICATE_REQUEST_ID:
dev_err(&adapter->ccw_device->dev,
"0x%Lx is an ambiguous request identifier\n",
(unsigned long long)qtcb->bottom.support.req_handle);
zfcp_erp_adapter_shutdown(adapter, 0, "fspse_4");
break;
case FSF_PROT_LINK_DOWN:
zfcp_fsf_link_down_info_eval(req, &psq->link_down_info);
/* go through reopen to flush pending requests */
zfcp_erp_adapter_reopen(adapter, 0, "fspse_6");
break;
case FSF_PROT_REEST_QUEUE:
/* All ports should be marked as ready to run again */
zfcp_erp_set_adapter_status(adapter,
ZFCP_STATUS_COMMON_RUNNING);
zfcp_erp_adapter_reopen(adapter,
ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED |
ZFCP_STATUS_COMMON_ERP_FAILED,
"fspse_8");
break;
default:
dev_err(&adapter->ccw_device->dev,
"0x%x is not a valid transfer protocol status\n",
qtcb->prefix.prot_status);
zfcp_qdio_siosl(adapter);
zfcp_erp_adapter_shutdown(adapter, 0, "fspse_9");
}
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
}
/**
* zfcp_fsf_req_complete - process completion of a FSF request
* @req: The FSF request that has been completed.
*
* When a request has been completed either from the FCP adapter,
* or it has been dismissed due to a queue shutdown, this function
* is called to process the completion status and trigger further
* events related to the FSF request.
*/
static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req)
{
scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header Status read buffers (SRBs, unsolicited notifications) never use a QTCB [zfcp_fsf_req_create()]. zfcp_fsf_req_send() already uses this to distinguish SRBs from other FSF request types. We can re-use this method in zfcp_fsf_req_complete(). Introduce a helper function to make the check for req->qtcb less magic. SRBs always are FSF_QTCB_UNSOLICITED_STATUS, so we can hard-code this for the two trace functions dealing with SRBs. All other FSF request types have a QTCB and we can get the fsf_command from there. zfcp_dbf_hba_fsf_response() and thus zfcp_dbf_hba_fsf_res() are only called for non-SRB requests so it's safe to dereference the QTCB [zfcp_fsf_req_complete() returns early on SRB, else calls zfcp_fsf_protstatus_eval() which calls zfcp_dbf_hba_fsf_response()]. In zfcp_scsi_forget_cmnd() we guard the QTCB dereference with a preceding NULL check and rely on boolean shortcut evaluation. As a side effect, this causes an alignment hole which we can close in a later patch after having cleaned up all fields of struct zfcp_fsf_req. Before: $ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko ... u32 status; /* 136 4 */ u32 fsf_command; /* 140 4 */ struct fsf_qtcb * qtcb; /* 144 8 */ ... After: $ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko ... u32 status; /* 136 4 */ /* XXX 4 bytes hole, try to pack */ struct fsf_qtcb * qtcb; /* 144 8 */ ... Signed-off-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-08 14:44:45 +00:00
if (unlikely(zfcp_fsf_req_is_status_read_buffer(req))) {
zfcp_fsf_status_read_handler(req);
return;
}
del_timer(&req->timer);
zfcp_fsf_protstatus_eval(req);
zfcp_fsf_fsfstatus_eval(req);
req->handler(req);
if (req->erp_action)
zfcp_erp_notify(req->erp_action, 0);
if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP))
zfcp_fsf_req_free(req);
else
complete(&req->completion);
}
/**
* zfcp_fsf_req_dismiss_all - dismiss all fsf requests
* @adapter: pointer to struct zfcp_adapter
*
* Never ever call this without shutting down the adapter first.
* Otherwise the adapter would continue using and corrupting s390 storage.
* Included BUG_ON() call to ensure this is done.
* ERP is supposed to be the only user of this function.
*/
void zfcp_fsf_req_dismiss_all(struct zfcp_adapter *adapter)
{
struct zfcp_fsf_req *req, *tmp;
LIST_HEAD(remove_queue);
BUG_ON(atomic_read(&adapter->status) & ZFCP_STATUS_ADAPTER_QDIOUP);
zfcp_reqlist_move(adapter->req_list, &remove_queue);
list_for_each_entry_safe(req, tmp, &remove_queue, list) {
list_del(&req->list);
req->status |= ZFCP_STATUS_FSFREQ_DISMISSED;
zfcp_fsf_req_complete(req);
}
}
#define ZFCP_FSF_PORTSPEED_1GBIT (1 << 0)
#define ZFCP_FSF_PORTSPEED_2GBIT (1 << 1)
#define ZFCP_FSF_PORTSPEED_4GBIT (1 << 2)
#define ZFCP_FSF_PORTSPEED_10GBIT (1 << 3)
#define ZFCP_FSF_PORTSPEED_8GBIT (1 << 4)
#define ZFCP_FSF_PORTSPEED_16GBIT (1 << 5)
#define ZFCP_FSF_PORTSPEED_32GBIT (1 << 6)
#define ZFCP_FSF_PORTSPEED_64GBIT (1 << 7)
#define ZFCP_FSF_PORTSPEED_128GBIT (1 << 8)
#define ZFCP_FSF_PORTSPEED_NOT_NEGOTIATED (1 << 15)
static u32 zfcp_fsf_convert_portspeed(u32 fsf_speed)
{
u32 fdmi_speed = 0;
if (fsf_speed & ZFCP_FSF_PORTSPEED_1GBIT)
fdmi_speed |= FC_PORTSPEED_1GBIT;
if (fsf_speed & ZFCP_FSF_PORTSPEED_2GBIT)
fdmi_speed |= FC_PORTSPEED_2GBIT;
if (fsf_speed & ZFCP_FSF_PORTSPEED_4GBIT)
fdmi_speed |= FC_PORTSPEED_4GBIT;
if (fsf_speed & ZFCP_FSF_PORTSPEED_10GBIT)
fdmi_speed |= FC_PORTSPEED_10GBIT;
if (fsf_speed & ZFCP_FSF_PORTSPEED_8GBIT)
fdmi_speed |= FC_PORTSPEED_8GBIT;
if (fsf_speed & ZFCP_FSF_PORTSPEED_16GBIT)
fdmi_speed |= FC_PORTSPEED_16GBIT;
if (fsf_speed & ZFCP_FSF_PORTSPEED_32GBIT)
fdmi_speed |= FC_PORTSPEED_32GBIT;
if (fsf_speed & ZFCP_FSF_PORTSPEED_64GBIT)
fdmi_speed |= FC_PORTSPEED_64GBIT;
if (fsf_speed & ZFCP_FSF_PORTSPEED_128GBIT)
fdmi_speed |= FC_PORTSPEED_128GBIT;
if (fsf_speed & ZFCP_FSF_PORTSPEED_NOT_NEGOTIATED)
fdmi_speed |= FC_PORTSPEED_NOT_NEGOTIATED;
return fdmi_speed;
}
static int zfcp_fsf_exchange_config_evaluate(struct zfcp_fsf_req *req)
{
struct fsf_qtcb_bottom_config *bottom = &req->qtcb->bottom.config;
struct zfcp_adapter *adapter = req->adapter;
struct Scsi_Host *shost = adapter->scsi_host;
struct fc_els_flogi *nsp, *plogi;
/* adjust pointers for missing command code */
nsp = (struct fc_els_flogi *) ((u8 *)&bottom->nport_serv_param
- sizeof(u32));
plogi = (struct fc_els_flogi *) ((u8 *)&bottom->plogi_payload
- sizeof(u32));
if (req->data)
memcpy(req->data, bottom, sizeof(*bottom));
scsi: zfcp: use endianness conversions with common FC(P) struct fields Just to silence sparse. Since zfcp only exists for s390 and s390 is big endian, this has been working correctly without conversions and all the new conversions are NOPs so no performance impact. Nonetheless, use the conversion on the constant expression where possible. NB: N_Port-IDs have always been handled with hton24 or ntoh24 conversions because they also convert to / from character array. Affected common code structs and .fields are: HOT I/O PATH: fcp_cmnd .fc_dl FCP command: regular SCSI I/O, including DIX case SEMI-HOT I/O PATH: fcp_cmnd .fc_dl recovery FCP command: task management function (LUN / target reset) fcp_resp_ext FCP response having FCP_SNS_LEN_VAL with .fr_rsp_len .fr_sns_len FCP response having FCP_RESID_UNDER with .fr_resid RECOVERY / DISCOVERY PATHS: fc_ct_hdr .ct_cmd .ct_mr_size zfcp auto port scan [GPN_FT] with fc_gpn_ft_resp.fp_wwpn, recovery for returned port [GID_PN] with fc_ns_gid_pn.fn_wwpn, get symbolic port name [GSPN], register symbolic port name [RSPN] (NPIV only). fc_els_rscn .rscn_plen incoming ELS (RSCN). fc_els_flogi .fl_wwpn .fl_wwnn incoming ELS (PLOGI), port open response with .fl_csp.sp_bb_data .fl_cssp[0..3].cp_class, FCP channel physical port, point-to-point peer (P2P only). fc_els_logo .fl_n_port_wwn incoming ELS (LOGO). fc_els_adisc .adisc_wwnn .adisc_wwpn path test after RSCN for gone target port. Since v4.10 commit 05de97003c77 ("linux/types.h: enable endian checks for all sparse builds"), below sparse endianness reports appear by default. Previously, one needed to pass argument CF="-D__CHECK_ENDIAN__" to make as in: $ make C=1 CF="-D__CHECK_ENDIAN__" M=drivers/s390/scsi. Silenced sparse warnings and one error: $ make C=1 M=drivers/s390/scsi ... CHECK drivers/s390/scsi/zfcp_dbf.c drivers/s390/scsi/zfcp_dbf.c:463:22: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_dbf.c:476:28: warning: restricted __be16 degrades to integer CC drivers/s390/scsi/zfcp_dbf.o ... CHECK drivers/s390/scsi/zfcp_fc.c drivers/s390/scsi/zfcp_fc.c:263:26: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:299:41: warning: incorrect type in argument 2 (different base types) drivers/s390/scsi/zfcp_fc.c:299:41: expected unsigned long long [unsigned] [usertype] wwpn drivers/s390/scsi/zfcp_fc.c:299:41: got restricted __be64 [usertype] fl_wwpn drivers/s390/scsi/zfcp_fc.c:309:40: warning: incorrect type in argument 2 (different base types) drivers/s390/scsi/zfcp_fc.c:309:40: expected unsigned long long [unsigned] [usertype] wwpn drivers/s390/scsi/zfcp_fc.c:309:40: got restricted __be64 [usertype] fl_n_port_wwn drivers/s390/scsi/zfcp_fc.c:338:31: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:355:24: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:355:24: expected restricted __be16 [usertype] ct_cmd drivers/s390/scsi/zfcp_fc.c:355:24: got unsigned short [unsigned] [usertype] cmd drivers/s390/scsi/zfcp_fc.c:356:28: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:356:28: expected restricted __be16 [usertype] ct_mr_size drivers/s390/scsi/zfcp_fc.c:356:28: got int drivers/s390/scsi/zfcp_fc.c:379:36: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:379:36: expected restricted __be64 [usertype] fn_wwpn drivers/s390/scsi/zfcp_fc.c:379:36: got unsigned long long [unsigned] [usertype] wwpn drivers/s390/scsi/zfcp_fc.c:463:18: warning: restricted __be64 degrades to integer drivers/s390/scsi/zfcp_fc.c:465:17: warning: cast from restricted __be64 drivers/s390/scsi/zfcp_fc.c:473:20: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:473:20: expected unsigned long long [unsigned] [usertype] wwnn drivers/s390/scsi/zfcp_fc.c:473:20: got restricted __be64 [usertype] fl_wwnn drivers/s390/scsi/zfcp_fc.c:474:29: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:474:29: expected unsigned int [unsigned] [usertype] maxframe_size drivers/s390/scsi/zfcp_fc.c:474:29: got restricted __be16 [usertype] sp_bb_data drivers/s390/scsi/zfcp_fc.c:476:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:478:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:480:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:482:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:500:28: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:500:28: expected unsigned long long [unsigned] [usertype] wwnn drivers/s390/scsi/zfcp_fc.c:500:28: got restricted __be64 [usertype] adisc_wwnn drivers/s390/scsi/zfcp_fc.c:502:38: warning: restricted __be64 degrades to integer drivers/s390/scsi/zfcp_fc.c:541:40: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:541:40: expected restricted __be64 [usertype] adisc_wwpn drivers/s390/scsi/zfcp_fc.c:541:40: got unsigned long long [unsigned] [usertype] port_name drivers/s390/scsi/zfcp_fc.c:542:40: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:542:40: expected restricted __be64 [usertype] adisc_wwnn drivers/s390/scsi/zfcp_fc.c:542:40: got unsigned long long [unsigned] [usertype] node_name drivers/s390/scsi/zfcp_fc.c:669:16: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:696:24: warning: restricted __be64 degrades to integer drivers/s390/scsi/zfcp_fc.c:699:54: warning: incorrect type in argument 2 (different base types) drivers/s390/scsi/zfcp_fc.c:699:54: expected unsigned long long [unsigned] [usertype] <noident> drivers/s390/scsi/zfcp_fc.c:699:54: got restricted __be64 [usertype] fp_wwpn CC drivers/s390/scsi/zfcp_fc.o CHECK drivers/s390/scsi/zfcp_fsf.c drivers/s390/scsi/zfcp_fsf.c:479:34: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:479:34: expected unsigned long long [unsigned] [usertype] port_name drivers/s390/scsi/zfcp_fsf.c:479:34: got restricted __be64 [usertype] fl_wwpn drivers/s390/scsi/zfcp_fsf.c:480:34: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:480:34: expected unsigned long long [unsigned] [usertype] node_name drivers/s390/scsi/zfcp_fsf.c:480:34: got restricted __be64 [usertype] fl_wwnn drivers/s390/scsi/zfcp_fsf.c:506:36: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:506:36: expected unsigned long long [unsigned] [usertype] peer_wwpn drivers/s390/scsi/zfcp_fsf.c:506:36: got restricted __be64 [usertype] fl_wwpn drivers/s390/scsi/zfcp_fsf.c:507:36: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:507:36: expected unsigned long long [unsigned] [usertype] peer_wwnn drivers/s390/scsi/zfcp_fsf.c:507:36: got restricted __be64 [usertype] fl_wwnn drivers/s390/scsi/zfcp_fc.h:269:46: warning: restricted __be32 degrades to integer drivers/s390/scsi/zfcp_fc.h:270:29: error: incompatible types in comparison expression (different base types) Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-28 10:31:02 +00:00
fc_host_port_name(shost) = be64_to_cpu(nsp->fl_wwpn);
fc_host_node_name(shost) = be64_to_cpu(nsp->fl_wwnn);
fc_host_supported_classes(shost) = FC_COS_CLASS2 | FC_COS_CLASS3;
adapter->timer_ticks = bottom->timer_interval & ZFCP_FSF_TIMER_INT_MASK;
adapter->stat_read_buf_num = max(bottom->status_read_buf_num,
(u16)FSF_STATUS_READS_RECOM);
if (fc_host_permanent_port_name(shost) == -1)
fc_host_permanent_port_name(shost) = fc_host_port_name(shost);
[SCSI] zfcp: status read buffers on first adapter open with link down Commit 64deb6efdc5504ce97b5c1c6f281fffbc150bd93 "[SCSI] zfcp: Use status_read_buf_num provided by FCP channel" started using a value returned by the channel but only evaluated the value if the fabric link is up. Commit 8d88cf3f3b9af4713642caeb221b6d6a42019001 "[SCSI] zfcp: Update status read mempool" introduced mempool resizings based on the above value. On setting an FCP device online for the very first time since boot, a new zeroed adapter object is allocated. If the link is down, the number of status read requests remains zero. Since just the config data exchange is incomplete, we proceed with adapter open recovery. However, we unconditionally call mempool_resize with adapter->stat_read_buf_num == 0 in this case. This causes a kernel message "kernel BUG at mm/mempool.c:131!" in process "zfcperp<FCP-device-bus-ID>" with last function mempool_resize in Krnl PSW and zfcp_erp_thread in the Call Trace. Don't evaluate channel values which are invalid on link down. The number of status read requests is always valid, evaluated, and set to a positive minimum greater than zero. The adapter open recovery can proceed and the channel has status read buffers to inform us on a future link up event. While we are not aware of any other code path that could result in mempool resize attempts of size zero, we still also initialize the number of status read buffers to be posted to a static minimum number on adapter object allocation. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> #2.6.35+ Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-04-26 15:34:54 +00:00
zfcp_scsi_set_prot(adapter);
/* no error return above here, otherwise must fix call chains */
/* do not evaluate invalid fields */
if (req->qtcb->header.fsf_status == FSF_EXCHANGE_CONFIG_DATA_INCOMPLETE)
return 0;
fc_host_port_id(shost) = ntoh24(bottom->s_id);
fc_host_speed(shost) =
zfcp_fsf_convert_portspeed(bottom->fc_link_speed);
adapter->hydra_version = bottom->adapter_type;
switch (bottom->fc_topology) {
case FSF_TOPO_P2P:
adapter->peer_d_id = ntoh24(bottom->peer_d_id);
scsi: zfcp: use endianness conversions with common FC(P) struct fields Just to silence sparse. Since zfcp only exists for s390 and s390 is big endian, this has been working correctly without conversions and all the new conversions are NOPs so no performance impact. Nonetheless, use the conversion on the constant expression where possible. NB: N_Port-IDs have always been handled with hton24 or ntoh24 conversions because they also convert to / from character array. Affected common code structs and .fields are: HOT I/O PATH: fcp_cmnd .fc_dl FCP command: regular SCSI I/O, including DIX case SEMI-HOT I/O PATH: fcp_cmnd .fc_dl recovery FCP command: task management function (LUN / target reset) fcp_resp_ext FCP response having FCP_SNS_LEN_VAL with .fr_rsp_len .fr_sns_len FCP response having FCP_RESID_UNDER with .fr_resid RECOVERY / DISCOVERY PATHS: fc_ct_hdr .ct_cmd .ct_mr_size zfcp auto port scan [GPN_FT] with fc_gpn_ft_resp.fp_wwpn, recovery for returned port [GID_PN] with fc_ns_gid_pn.fn_wwpn, get symbolic port name [GSPN], register symbolic port name [RSPN] (NPIV only). fc_els_rscn .rscn_plen incoming ELS (RSCN). fc_els_flogi .fl_wwpn .fl_wwnn incoming ELS (PLOGI), port open response with .fl_csp.sp_bb_data .fl_cssp[0..3].cp_class, FCP channel physical port, point-to-point peer (P2P only). fc_els_logo .fl_n_port_wwn incoming ELS (LOGO). fc_els_adisc .adisc_wwnn .adisc_wwpn path test after RSCN for gone target port. Since v4.10 commit 05de97003c77 ("linux/types.h: enable endian checks for all sparse builds"), below sparse endianness reports appear by default. Previously, one needed to pass argument CF="-D__CHECK_ENDIAN__" to make as in: $ make C=1 CF="-D__CHECK_ENDIAN__" M=drivers/s390/scsi. Silenced sparse warnings and one error: $ make C=1 M=drivers/s390/scsi ... CHECK drivers/s390/scsi/zfcp_dbf.c drivers/s390/scsi/zfcp_dbf.c:463:22: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_dbf.c:476:28: warning: restricted __be16 degrades to integer CC drivers/s390/scsi/zfcp_dbf.o ... CHECK drivers/s390/scsi/zfcp_fc.c drivers/s390/scsi/zfcp_fc.c:263:26: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:299:41: warning: incorrect type in argument 2 (different base types) drivers/s390/scsi/zfcp_fc.c:299:41: expected unsigned long long [unsigned] [usertype] wwpn drivers/s390/scsi/zfcp_fc.c:299:41: got restricted __be64 [usertype] fl_wwpn drivers/s390/scsi/zfcp_fc.c:309:40: warning: incorrect type in argument 2 (different base types) drivers/s390/scsi/zfcp_fc.c:309:40: expected unsigned long long [unsigned] [usertype] wwpn drivers/s390/scsi/zfcp_fc.c:309:40: got restricted __be64 [usertype] fl_n_port_wwn drivers/s390/scsi/zfcp_fc.c:338:31: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:355:24: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:355:24: expected restricted __be16 [usertype] ct_cmd drivers/s390/scsi/zfcp_fc.c:355:24: got unsigned short [unsigned] [usertype] cmd drivers/s390/scsi/zfcp_fc.c:356:28: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:356:28: expected restricted __be16 [usertype] ct_mr_size drivers/s390/scsi/zfcp_fc.c:356:28: got int drivers/s390/scsi/zfcp_fc.c:379:36: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:379:36: expected restricted __be64 [usertype] fn_wwpn drivers/s390/scsi/zfcp_fc.c:379:36: got unsigned long long [unsigned] [usertype] wwpn drivers/s390/scsi/zfcp_fc.c:463:18: warning: restricted __be64 degrades to integer drivers/s390/scsi/zfcp_fc.c:465:17: warning: cast from restricted __be64 drivers/s390/scsi/zfcp_fc.c:473:20: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:473:20: expected unsigned long long [unsigned] [usertype] wwnn drivers/s390/scsi/zfcp_fc.c:473:20: got restricted __be64 [usertype] fl_wwnn drivers/s390/scsi/zfcp_fc.c:474:29: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:474:29: expected unsigned int [unsigned] [usertype] maxframe_size drivers/s390/scsi/zfcp_fc.c:474:29: got restricted __be16 [usertype] sp_bb_data drivers/s390/scsi/zfcp_fc.c:476:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:478:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:480:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:482:30: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:500:28: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:500:28: expected unsigned long long [unsigned] [usertype] wwnn drivers/s390/scsi/zfcp_fc.c:500:28: got restricted __be64 [usertype] adisc_wwnn drivers/s390/scsi/zfcp_fc.c:502:38: warning: restricted __be64 degrades to integer drivers/s390/scsi/zfcp_fc.c:541:40: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:541:40: expected restricted __be64 [usertype] adisc_wwpn drivers/s390/scsi/zfcp_fc.c:541:40: got unsigned long long [unsigned] [usertype] port_name drivers/s390/scsi/zfcp_fc.c:542:40: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fc.c:542:40: expected restricted __be64 [usertype] adisc_wwnn drivers/s390/scsi/zfcp_fc.c:542:40: got unsigned long long [unsigned] [usertype] node_name drivers/s390/scsi/zfcp_fc.c:669:16: warning: restricted __be16 degrades to integer drivers/s390/scsi/zfcp_fc.c:696:24: warning: restricted __be64 degrades to integer drivers/s390/scsi/zfcp_fc.c:699:54: warning: incorrect type in argument 2 (different base types) drivers/s390/scsi/zfcp_fc.c:699:54: expected unsigned long long [unsigned] [usertype] <noident> drivers/s390/scsi/zfcp_fc.c:699:54: got restricted __be64 [usertype] fp_wwpn CC drivers/s390/scsi/zfcp_fc.o CHECK drivers/s390/scsi/zfcp_fsf.c drivers/s390/scsi/zfcp_fsf.c:479:34: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:479:34: expected unsigned long long [unsigned] [usertype] port_name drivers/s390/scsi/zfcp_fsf.c:479:34: got restricted __be64 [usertype] fl_wwpn drivers/s390/scsi/zfcp_fsf.c:480:34: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:480:34: expected unsigned long long [unsigned] [usertype] node_name drivers/s390/scsi/zfcp_fsf.c:480:34: got restricted __be64 [usertype] fl_wwnn drivers/s390/scsi/zfcp_fsf.c:506:36: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:506:36: expected unsigned long long [unsigned] [usertype] peer_wwpn drivers/s390/scsi/zfcp_fsf.c:506:36: got restricted __be64 [usertype] fl_wwpn drivers/s390/scsi/zfcp_fsf.c:507:36: warning: incorrect type in assignment (different base types) drivers/s390/scsi/zfcp_fsf.c:507:36: expected unsigned long long [unsigned] [usertype] peer_wwnn drivers/s390/scsi/zfcp_fsf.c:507:36: got restricted __be64 [usertype] fl_wwnn drivers/s390/scsi/zfcp_fc.h:269:46: warning: restricted __be32 degrades to integer drivers/s390/scsi/zfcp_fc.h:270:29: error: incompatible types in comparison expression (different base types) Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-28 10:31:02 +00:00
adapter->peer_wwpn = be64_to_cpu(plogi->fl_wwpn);
adapter->peer_wwnn = be64_to_cpu(plogi->fl_wwnn);
fc_host_port_type(shost) = FC_PORTTYPE_PTP;
scsi: zfcp: expose fabric name as common fc_host sysfs attribute FICON Express8S or older, as well as card features newer than FICON Express16S+ have no certain firmware level requirement. FICON Express16S or FICON Express16S+ have the following minimum firmware level requirements to show a proper fabric name value: z13 machine FICON Express16S , MCL P08424.005 , LIC version 0x00000721 z14 machine FICON Express16S , MCL P42611.008 , LIC version 0x10200069 FICON Express16S+ , MCL P42625.010 , LIC version 0x10300147 Otherwise, the read value is not the fabric name. Each FCP channel of these card features might need one SAN fabric re-login after concurrent microcode update in order to show the proper fabric name. Possible ways to trigger a SAN fabric re-login are one of: Pull fibres between FCP channel port and SAN switch port on either side and re-plug, disable SAN switch port adjacent to FCP channel port and re-enable switch port, or at Service Element toggle off all CHPIDs of FCP channel over all LPARs and toggle CHPIDs on again. Zfcp operating subchannels (FCP devices) on such FCP channel recovers a fabric re-login. Initialize fabric name for any topology and have it an invalid WWPN 0x0 for anything but fabric topology. Otherwise for e.g. point-to-point topology one could see the initial -1 from fc_host_setup() and after a link unplug our fabric name would turn to 0x0 (with subsequent commit ("zfcp: fix fc_host attributes that should be unknown on local link down") and stay 0x0 on link replug. I did not initialize to 0x0 somewhere even earlier in the code path such that it would not flap from real to 0x0 to real on e.g. an exchange config data with fabric topology. Link: https://lore.kernel.org/r/20200312174505.51294-3-maier@linux.ibm.com Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-12 17:44:57 +00:00
fc_host_fabric_name(shost) = 0;
break;
case FSF_TOPO_FABRIC:
scsi: zfcp: expose fabric name as common fc_host sysfs attribute FICON Express8S or older, as well as card features newer than FICON Express16S+ have no certain firmware level requirement. FICON Express16S or FICON Express16S+ have the following minimum firmware level requirements to show a proper fabric name value: z13 machine FICON Express16S , MCL P08424.005 , LIC version 0x00000721 z14 machine FICON Express16S , MCL P42611.008 , LIC version 0x10200069 FICON Express16S+ , MCL P42625.010 , LIC version 0x10300147 Otherwise, the read value is not the fabric name. Each FCP channel of these card features might need one SAN fabric re-login after concurrent microcode update in order to show the proper fabric name. Possible ways to trigger a SAN fabric re-login are one of: Pull fibres between FCP channel port and SAN switch port on either side and re-plug, disable SAN switch port adjacent to FCP channel port and re-enable switch port, or at Service Element toggle off all CHPIDs of FCP channel over all LPARs and toggle CHPIDs on again. Zfcp operating subchannels (FCP devices) on such FCP channel recovers a fabric re-login. Initialize fabric name for any topology and have it an invalid WWPN 0x0 for anything but fabric topology. Otherwise for e.g. point-to-point topology one could see the initial -1 from fc_host_setup() and after a link unplug our fabric name would turn to 0x0 (with subsequent commit ("zfcp: fix fc_host attributes that should be unknown on local link down") and stay 0x0 on link replug. I did not initialize to 0x0 somewhere even earlier in the code path such that it would not flap from real to 0x0 to real on e.g. an exchange config data with fabric topology. Link: https://lore.kernel.org/r/20200312174505.51294-3-maier@linux.ibm.com Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-12 17:44:57 +00:00
fc_host_fabric_name(shost) = be64_to_cpu(plogi->fl_wwnn);
zfcp: fix fc_host port_type with NPIV For an NPIV-enabled FCP device, zfcp can erroneously show "NPort (fabric via point-to-point)" instead of "NPIV VPORT" for the port_type sysfs attribute of the corresponding fc_host. s390-tools that can be affected are dbginfo.sh and ziomon. zfcp_fsf_exchange_config_evaluate() ignores fsf_qtcb_bottom_config.connection_features indicating NPIV and only sets fc_host_port_type to FC_PORTTYPE_NPORT if fsf_qtcb_bottom_config.fc_topology is FSF_TOPO_FABRIC. Only the independent zfcp_fsf_exchange_port_evaluate() evaluates connection_features to overwrite fc_host_port_type to FC_PORTTYPE_NPIV in case of NPIV. Code was introduced with upstream kernel 2.6.30 commit 0282985da5923fa6365adcc1a1586ae0c13c1617 ("[SCSI] zfcp: Report fc_host_port_type as NPIV"). This works during FCP device recovery (such as set online) because it performs FSF_QTCB_EXCHANGE_CONFIG_DATA followed by FSF_QTCB_EXCHANGE_PORT_DATA in sequence. However, the zfcp-specific scsi host sysfs attributes "requests", "megabytes", or "seconds_active" trigger only zfcp_fsf_exchange_config_evaluate() resetting fc_host port_type to FC_PORTTYPE_NPORT despite NPIV. The zfcp-specific scsi host sysfs attribute "utilization" triggers only zfcp_fsf_exchange_port_evaluate() correcting the fc_host port_type again in case of NPIV. Evaluate fsf_qtcb_bottom_config.connection_features in zfcp_fsf_exchange_config_evaluate() where it belongs to. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 0282985da592 ("[SCSI] zfcp: Report fc_host_port_type as NPIV") Cc: <stable@vger.kernel.org> #2.6.30+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-08-10 16:30:44 +00:00
if (bottom->connection_features & FSF_FEATURE_NPIV_MODE)
fc_host_port_type(shost) = FC_PORTTYPE_NPIV;
else
fc_host_port_type(shost) = FC_PORTTYPE_NPORT;
break;
case FSF_TOPO_AL:
fc_host_port_type(shost) = FC_PORTTYPE_NLPORT;
scsi: zfcp: expose fabric name as common fc_host sysfs attribute FICON Express8S or older, as well as card features newer than FICON Express16S+ have no certain firmware level requirement. FICON Express16S or FICON Express16S+ have the following minimum firmware level requirements to show a proper fabric name value: z13 machine FICON Express16S , MCL P08424.005 , LIC version 0x00000721 z14 machine FICON Express16S , MCL P42611.008 , LIC version 0x10200069 FICON Express16S+ , MCL P42625.010 , LIC version 0x10300147 Otherwise, the read value is not the fabric name. Each FCP channel of these card features might need one SAN fabric re-login after concurrent microcode update in order to show the proper fabric name. Possible ways to trigger a SAN fabric re-login are one of: Pull fibres between FCP channel port and SAN switch port on either side and re-plug, disable SAN switch port adjacent to FCP channel port and re-enable switch port, or at Service Element toggle off all CHPIDs of FCP channel over all LPARs and toggle CHPIDs on again. Zfcp operating subchannels (FCP devices) on such FCP channel recovers a fabric re-login. Initialize fabric name for any topology and have it an invalid WWPN 0x0 for anything but fabric topology. Otherwise for e.g. point-to-point topology one could see the initial -1 from fc_host_setup() and after a link unplug our fabric name would turn to 0x0 (with subsequent commit ("zfcp: fix fc_host attributes that should be unknown on local link down") and stay 0x0 on link replug. I did not initialize to 0x0 somewhere even earlier in the code path such that it would not flap from real to 0x0 to real on e.g. an exchange config data with fabric topology. Link: https://lore.kernel.org/r/20200312174505.51294-3-maier@linux.ibm.com Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-12 17:44:57 +00:00
fc_host_fabric_name(shost) = 0;
/* fall through */
default:
scsi: zfcp: expose fabric name as common fc_host sysfs attribute FICON Express8S or older, as well as card features newer than FICON Express16S+ have no certain firmware level requirement. FICON Express16S or FICON Express16S+ have the following minimum firmware level requirements to show a proper fabric name value: z13 machine FICON Express16S , MCL P08424.005 , LIC version 0x00000721 z14 machine FICON Express16S , MCL P42611.008 , LIC version 0x10200069 FICON Express16S+ , MCL P42625.010 , LIC version 0x10300147 Otherwise, the read value is not the fabric name. Each FCP channel of these card features might need one SAN fabric re-login after concurrent microcode update in order to show the proper fabric name. Possible ways to trigger a SAN fabric re-login are one of: Pull fibres between FCP channel port and SAN switch port on either side and re-plug, disable SAN switch port adjacent to FCP channel port and re-enable switch port, or at Service Element toggle off all CHPIDs of FCP channel over all LPARs and toggle CHPIDs on again. Zfcp operating subchannels (FCP devices) on such FCP channel recovers a fabric re-login. Initialize fabric name for any topology and have it an invalid WWPN 0x0 for anything but fabric topology. Otherwise for e.g. point-to-point topology one could see the initial -1 from fc_host_setup() and after a link unplug our fabric name would turn to 0x0 (with subsequent commit ("zfcp: fix fc_host attributes that should be unknown on local link down") and stay 0x0 on link replug. I did not initialize to 0x0 somewhere even earlier in the code path such that it would not flap from real to 0x0 to real on e.g. an exchange config data with fabric topology. Link: https://lore.kernel.org/r/20200312174505.51294-3-maier@linux.ibm.com Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-12 17:44:57 +00:00
fc_host_fabric_name(shost) = 0;
dev_err(&adapter->ccw_device->dev,
"Unknown or unsupported arbitrated loop "
"fibre channel topology detected\n");
zfcp_erp_adapter_shutdown(adapter, 0, "fsece_1");
return -EIO;
}
return 0;
}
static void zfcp_fsf_exchange_config_data_handler(struct zfcp_fsf_req *req)
{
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_diag_header *const diag_hdr =
&adapter->diagnostics->config_data.header;
struct fsf_qtcb *qtcb = req->qtcb;
struct fsf_qtcb_bottom_config *bottom = &qtcb->bottom.config;
struct Scsi_Host *shost = adapter->scsi_host;
if (req->status & ZFCP_STATUS_FSFREQ_ERROR)
return;
adapter->fsf_lic_version = bottom->lic_version;
adapter->adapter_features = bottom->adapter_features;
adapter->connection_features = bottom->connection_features;
adapter->peer_wwpn = 0;
adapter->peer_wwnn = 0;
adapter->peer_d_id = 0;
switch (qtcb->header.fsf_status) {
case FSF_GOOD:
/*
* usually we wait with an update till the cache is too old,
* but because we have the data available, update it anyway
*/
zfcp_diag_update_xdata(diag_hdr, bottom, false);
if (zfcp_fsf_exchange_config_evaluate(req))
return;
if (bottom->max_qtcb_size < sizeof(struct fsf_qtcb)) {
dev_err(&adapter->ccw_device->dev,
"FCP adapter maximum QTCB size (%d bytes) "
"is too small\n",
bottom->max_qtcb_size);
zfcp_erp_adapter_shutdown(adapter, 0, "fsecdh1");
return;
}
atomic_or(ZFCP_STATUS_ADAPTER_XCONFIG_OK,
&adapter->status);
break;
case FSF_EXCHANGE_CONFIG_DATA_INCOMPLETE:
zfcp_diag_update_xdata(diag_hdr, bottom, true);
scsi: zfcp: signal incomplete or error for sync exchange config/port data Adds a new FSF-Request status flag (ZFCP_STATUS_FSFREQ_XDATAINCOMPLETE) that signal that the data received using Exchange Config Data or Exchange Port Data was incomplete. This new flags is set in the respective handlers during the response path. With this patch, only the synchronous FSF-functions for each command got support for the new flag, otherwise it is transparent. Together with this new flag and already existing status flags the synchronous FSF-functions are extended to now detect whether the received data is complete, incomplete or completely invalid (this includes cases where a command ran into a timeout). This is now signaled back to the caller, where previously only failures on the request path would result in a bad return-code. For complete data the return-code remains 0. For incomplete data a new return-code -EAGAIN is added to the function-interface. For completely invalid data the already existing return-code -EIO is reused - formerly this was used to signal failures on the request path. Existing callers of the FSF-functions are adjusted so that they behave as before for return-code 0 and -EAGAIN, to not change the user-interface. As -EIO existed all along, it was already exposed to the user - and needed handling - and will now also be exposed in this new special case. Link: https://lore.kernel.org/r/e14f0702fa2b00a4d1f37c7981a13f2dd1ea2c83.1572018130.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-25 16:12:43 +00:00
req->status |= ZFCP_STATUS_FSFREQ_XDATAINCOMPLETE;
fc_host_node_name(shost) = 0;
fc_host_port_name(shost) = 0;
fc_host_port_id(shost) = 0;
scsi: zfcp: expose fabric name as common fc_host sysfs attribute FICON Express8S or older, as well as card features newer than FICON Express16S+ have no certain firmware level requirement. FICON Express16S or FICON Express16S+ have the following minimum firmware level requirements to show a proper fabric name value: z13 machine FICON Express16S , MCL P08424.005 , LIC version 0x00000721 z14 machine FICON Express16S , MCL P42611.008 , LIC version 0x10200069 FICON Express16S+ , MCL P42625.010 , LIC version 0x10300147 Otherwise, the read value is not the fabric name. Each FCP channel of these card features might need one SAN fabric re-login after concurrent microcode update in order to show the proper fabric name. Possible ways to trigger a SAN fabric re-login are one of: Pull fibres between FCP channel port and SAN switch port on either side and re-plug, disable SAN switch port adjacent to FCP channel port and re-enable switch port, or at Service Element toggle off all CHPIDs of FCP channel over all LPARs and toggle CHPIDs on again. Zfcp operating subchannels (FCP devices) on such FCP channel recovers a fabric re-login. Initialize fabric name for any topology and have it an invalid WWPN 0x0 for anything but fabric topology. Otherwise for e.g. point-to-point topology one could see the initial -1 from fc_host_setup() and after a link unplug our fabric name would turn to 0x0 (with subsequent commit ("zfcp: fix fc_host attributes that should be unknown on local link down") and stay 0x0 on link replug. I did not initialize to 0x0 somewhere even earlier in the code path such that it would not flap from real to 0x0 to real on e.g. an exchange config data with fabric topology. Link: https://lore.kernel.org/r/20200312174505.51294-3-maier@linux.ibm.com Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-12 17:44:57 +00:00
fc_host_fabric_name(shost) = 0;
fc_host_speed(shost) = FC_PORTSPEED_UNKNOWN;
fc_host_port_type(shost) = FC_PORTTYPE_UNKNOWN;
adapter->hydra_version = 0;
/* avoids adapter shutdown to be able to recognize
* events such as LINK UP */
atomic_or(ZFCP_STATUS_ADAPTER_XCONFIG_OK,
&adapter->status);
zfcp_fsf_link_down_info_eval(req,
&qtcb->header.fsf_status_qual.link_down_info);
[SCSI] zfcp: status read buffers on first adapter open with link down Commit 64deb6efdc5504ce97b5c1c6f281fffbc150bd93 "[SCSI] zfcp: Use status_read_buf_num provided by FCP channel" started using a value returned by the channel but only evaluated the value if the fabric link is up. Commit 8d88cf3f3b9af4713642caeb221b6d6a42019001 "[SCSI] zfcp: Update status read mempool" introduced mempool resizings based on the above value. On setting an FCP device online for the very first time since boot, a new zeroed adapter object is allocated. If the link is down, the number of status read requests remains zero. Since just the config data exchange is incomplete, we proceed with adapter open recovery. However, we unconditionally call mempool_resize with adapter->stat_read_buf_num == 0 in this case. This causes a kernel message "kernel BUG at mm/mempool.c:131!" in process "zfcperp<FCP-device-bus-ID>" with last function mempool_resize in Krnl PSW and zfcp_erp_thread in the Call Trace. Don't evaluate channel values which are invalid on link down. The number of status read requests is always valid, evaluated, and set to a positive minimum greater than zero. The adapter open recovery can proceed and the channel has status read buffers to inform us on a future link up event. While we are not aware of any other code path that could result in mempool resize attempts of size zero, we still also initialize the number of status read buffers to be posted to a static minimum number on adapter object allocation. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> #2.6.35+ Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-04-26 15:34:54 +00:00
if (zfcp_fsf_exchange_config_evaluate(req))
return;
break;
default:
zfcp_erp_adapter_shutdown(adapter, 0, "fsecdh3");
return;
}
if (adapter->adapter_features & FSF_FEATURE_HBAAPI_MANAGEMENT) {
adapter->hardware_version = bottom->hardware_version;
memcpy(fc_host_serial_number(shost), bottom->serial_number,
min(FC_SERIAL_NUMBER_SIZE, 17));
EBCASC(fc_host_serial_number(shost),
min(FC_SERIAL_NUMBER_SIZE, 17));
}
if (FSF_QTCB_CURRENT_VERSION < bottom->low_qtcb_version) {
dev_err(&adapter->ccw_device->dev,
"The FCP adapter only supports newer "
"control block versions\n");
zfcp_erp_adapter_shutdown(adapter, 0, "fsecdh4");
return;
}
if (FSF_QTCB_CURRENT_VERSION > bottom->high_qtcb_version) {
dev_err(&adapter->ccw_device->dev,
"The FCP adapter only supports older "
"control block versions\n");
zfcp_erp_adapter_shutdown(adapter, 0, "fsecdh5");
}
}
static void zfcp_fsf_exchange_port_evaluate(struct zfcp_fsf_req *req)
{
struct zfcp_adapter *adapter = req->adapter;
struct fsf_qtcb_bottom_port *bottom = &req->qtcb->bottom.port;
struct Scsi_Host *shost = adapter->scsi_host;
if (req->data)
memcpy(req->data, bottom, sizeof(*bottom));
if (adapter->connection_features & FSF_FEATURE_NPIV_MODE) {
fc_host_permanent_port_name(shost) = bottom->wwpn;
} else
fc_host_permanent_port_name(shost) = fc_host_port_name(shost);
fc_host_maxframe_size(shost) = bottom->maximum_frame_size;
fc_host_supported_speeds(shost) =
zfcp_fsf_convert_portspeed(bottom->supported_speed);
memcpy(fc_host_supported_fc4s(shost), bottom->supported_fc4_types,
FC_FC4_LIST_SIZE);
memcpy(fc_host_active_fc4s(shost), bottom->active_fc4_types,
FC_FC4_LIST_SIZE);
}
static void zfcp_fsf_exchange_port_data_handler(struct zfcp_fsf_req *req)
{
struct zfcp_diag_header *const diag_hdr =
&req->adapter->diagnostics->port_data.header;
struct fsf_qtcb *qtcb = req->qtcb;
struct fsf_qtcb_bottom_port *bottom = &qtcb->bottom.port;
if (req->status & ZFCP_STATUS_FSFREQ_ERROR)
return;
switch (qtcb->header.fsf_status) {
case FSF_GOOD:
/*
* usually we wait with an update till the cache is too old,
* but because we have the data available, update it anyway
*/
zfcp_diag_update_xdata(diag_hdr, bottom, false);
zfcp_fsf_exchange_port_evaluate(req);
break;
case FSF_EXCHANGE_CONFIG_DATA_INCOMPLETE:
zfcp_diag_update_xdata(diag_hdr, bottom, true);
scsi: zfcp: signal incomplete or error for sync exchange config/port data Adds a new FSF-Request status flag (ZFCP_STATUS_FSFREQ_XDATAINCOMPLETE) that signal that the data received using Exchange Config Data or Exchange Port Data was incomplete. This new flags is set in the respective handlers during the response path. With this patch, only the synchronous FSF-functions for each command got support for the new flag, otherwise it is transparent. Together with this new flag and already existing status flags the synchronous FSF-functions are extended to now detect whether the received data is complete, incomplete or completely invalid (this includes cases where a command ran into a timeout). This is now signaled back to the caller, where previously only failures on the request path would result in a bad return-code. For complete data the return-code remains 0. For incomplete data a new return-code -EAGAIN is added to the function-interface. For completely invalid data the already existing return-code -EIO is reused - formerly this was used to signal failures on the request path. Existing callers of the FSF-functions are adjusted so that they behave as before for return-code 0 and -EAGAIN, to not change the user-interface. As -EIO existed all along, it was already exposed to the user - and needed handling - and will now also be exposed in this new special case. Link: https://lore.kernel.org/r/e14f0702fa2b00a4d1f37c7981a13f2dd1ea2c83.1572018130.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-25 16:12:43 +00:00
req->status |= ZFCP_STATUS_FSFREQ_XDATAINCOMPLETE;
zfcp_fsf_exchange_port_evaluate(req);
zfcp_fsf_link_down_info_eval(req,
&qtcb->header.fsf_status_qual.link_down_info);
break;
}
}
static struct zfcp_fsf_req *zfcp_fsf_alloc(mempool_t *pool)
{
struct zfcp_fsf_req *req;
if (likely(pool))
req = mempool_alloc(pool, GFP_ATOMIC);
else
req = kmalloc(sizeof(*req), GFP_ATOMIC);
if (unlikely(!req))
return NULL;
memset(req, 0, sizeof(*req));
req->pool = pool;
return req;
}
scsi: zfcp: consistently use function name space prefix I've been mixing up zfcp_task_mgmt_function() [SCSI] and zfcp_fsf_fcp_task_mgmt() [FSF] so often lately that I wanted to fix this. SCSI changes complement v2.6.27 commit f76af7d7e363 ("[SCSI] zfcp: Cleanup of code in zfcp_scsi.c"). While at it, also fixup the other inconsistencies elsewhere. ERP changes complement v2.6.27 commit 287ac01acf22 ("[SCSI] zfcp: Cleanup code in zfcp_erp.c") which introduced status_change_set(). FC changes complement v2.6.32 commit 6f53a2d2ecae ("[SCSI] zfcp: Apply common naming conventions to zfcp_fc"). by renaming a leftover introduced with v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports"). FSF changes fixup v2.6.32 commit a4623c467ff7 ("[SCSI] zfcp: Improve request allocation through mempools"). which replaced zfcp_fsf_alloc_qtcb() introduced with v2.6.27 commit c41f8cbddd4e ("[SCSI] zfcp: zfcp_fsf cleanup."). SCSI fc_host statistics were introduced with v2.6.16 commit f6cd94b126aa ("[SCSI] zfcp: transport class adaptations"). SCSI fc_host port_state was introduced with v2.6.27 commit 85a82392fe6f ("[SCSI] zfcp: Add port_state attribute to sysfs"). SCSI rport setter for dev_loss_tmo was introduced with v2.6.18 commit 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable"). Signed-off-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-17 17:14:58 +00:00
static struct fsf_qtcb *zfcp_fsf_qtcb_alloc(mempool_t *pool)
{
struct fsf_qtcb *qtcb;
if (likely(pool))
qtcb = mempool_alloc(pool, GFP_ATOMIC);
else
qtcb = kmem_cache_alloc(zfcp_fsf_qtcb_cache, GFP_ATOMIC);
if (unlikely(!qtcb))
return NULL;
memset(qtcb, 0, sizeof(*qtcb));
return qtcb;
}
static struct zfcp_fsf_req *zfcp_fsf_req_create(struct zfcp_qdio *qdio,
u32 fsf_cmd, u8 sbtype,
mempool_t *pool)
{
struct zfcp_adapter *adapter = qdio->adapter;
struct zfcp_fsf_req *req = zfcp_fsf_alloc(pool);
if (unlikely(!req))
return ERR_PTR(-ENOMEM);
if (adapter->req_no == 0)
adapter->req_no++;
INIT_LIST_HEAD(&req->list);
timer_setup(&req->timer, NULL, 0);
init_completion(&req->completion);
req->adapter = adapter;
req->req_id = adapter->req_no;
if (likely(fsf_cmd != FSF_QTCB_UNSOLICITED_STATUS)) {
if (likely(pool))
scsi: zfcp: consistently use function name space prefix I've been mixing up zfcp_task_mgmt_function() [SCSI] and zfcp_fsf_fcp_task_mgmt() [FSF] so often lately that I wanted to fix this. SCSI changes complement v2.6.27 commit f76af7d7e363 ("[SCSI] zfcp: Cleanup of code in zfcp_scsi.c"). While at it, also fixup the other inconsistencies elsewhere. ERP changes complement v2.6.27 commit 287ac01acf22 ("[SCSI] zfcp: Cleanup code in zfcp_erp.c") which introduced status_change_set(). FC changes complement v2.6.32 commit 6f53a2d2ecae ("[SCSI] zfcp: Apply common naming conventions to zfcp_fc"). by renaming a leftover introduced with v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports"). FSF changes fixup v2.6.32 commit a4623c467ff7 ("[SCSI] zfcp: Improve request allocation through mempools"). which replaced zfcp_fsf_alloc_qtcb() introduced with v2.6.27 commit c41f8cbddd4e ("[SCSI] zfcp: zfcp_fsf cleanup."). SCSI fc_host statistics were introduced with v2.6.16 commit f6cd94b126aa ("[SCSI] zfcp: transport class adaptations"). SCSI fc_host port_state was introduced with v2.6.27 commit 85a82392fe6f ("[SCSI] zfcp: Add port_state attribute to sysfs"). SCSI rport setter for dev_loss_tmo was introduced with v2.6.18 commit 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable"). Signed-off-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-17 17:14:58 +00:00
req->qtcb = zfcp_fsf_qtcb_alloc(
adapter->pool.qtcb_pool);
else
scsi: zfcp: consistently use function name space prefix I've been mixing up zfcp_task_mgmt_function() [SCSI] and zfcp_fsf_fcp_task_mgmt() [FSF] so often lately that I wanted to fix this. SCSI changes complement v2.6.27 commit f76af7d7e363 ("[SCSI] zfcp: Cleanup of code in zfcp_scsi.c"). While at it, also fixup the other inconsistencies elsewhere. ERP changes complement v2.6.27 commit 287ac01acf22 ("[SCSI] zfcp: Cleanup code in zfcp_erp.c") which introduced status_change_set(). FC changes complement v2.6.32 commit 6f53a2d2ecae ("[SCSI] zfcp: Apply common naming conventions to zfcp_fc"). by renaming a leftover introduced with v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports"). FSF changes fixup v2.6.32 commit a4623c467ff7 ("[SCSI] zfcp: Improve request allocation through mempools"). which replaced zfcp_fsf_alloc_qtcb() introduced with v2.6.27 commit c41f8cbddd4e ("[SCSI] zfcp: zfcp_fsf cleanup."). SCSI fc_host statistics were introduced with v2.6.16 commit f6cd94b126aa ("[SCSI] zfcp: transport class adaptations"). SCSI fc_host port_state was introduced with v2.6.27 commit 85a82392fe6f ("[SCSI] zfcp: Add port_state attribute to sysfs"). SCSI rport setter for dev_loss_tmo was introduced with v2.6.18 commit 338151e06608 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable"). Signed-off-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-17 17:14:58 +00:00
req->qtcb = zfcp_fsf_qtcb_alloc(NULL);
if (unlikely(!req->qtcb)) {
zfcp_fsf_req_free(req);
return ERR_PTR(-ENOMEM);
}
req->qtcb->prefix.req_seq_no = adapter->fsf_req_seq_no;
req->qtcb->prefix.req_id = req->req_id;
req->qtcb->prefix.ulp_info = 26;
scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header Status read buffers (SRBs, unsolicited notifications) never use a QTCB [zfcp_fsf_req_create()]. zfcp_fsf_req_send() already uses this to distinguish SRBs from other FSF request types. We can re-use this method in zfcp_fsf_req_complete(). Introduce a helper function to make the check for req->qtcb less magic. SRBs always are FSF_QTCB_UNSOLICITED_STATUS, so we can hard-code this for the two trace functions dealing with SRBs. All other FSF request types have a QTCB and we can get the fsf_command from there. zfcp_dbf_hba_fsf_response() and thus zfcp_dbf_hba_fsf_res() are only called for non-SRB requests so it's safe to dereference the QTCB [zfcp_fsf_req_complete() returns early on SRB, else calls zfcp_fsf_protstatus_eval() which calls zfcp_dbf_hba_fsf_response()]. In zfcp_scsi_forget_cmnd() we guard the QTCB dereference with a preceding NULL check and rely on boolean shortcut evaluation. As a side effect, this causes an alignment hole which we can close in a later patch after having cleaned up all fields of struct zfcp_fsf_req. Before: $ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko ... u32 status; /* 136 4 */ u32 fsf_command; /* 140 4 */ struct fsf_qtcb * qtcb; /* 144 8 */ ... After: $ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko ... u32 status; /* 136 4 */ /* XXX 4 bytes hole, try to pack */ struct fsf_qtcb * qtcb; /* 144 8 */ ... Signed-off-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-08 14:44:45 +00:00
req->qtcb->prefix.qtcb_type = fsf_qtcb_type[fsf_cmd];
req->qtcb->prefix.qtcb_version = FSF_QTCB_CURRENT_VERSION;
req->qtcb->header.req_handle = req->req_id;
scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header Status read buffers (SRBs, unsolicited notifications) never use a QTCB [zfcp_fsf_req_create()]. zfcp_fsf_req_send() already uses this to distinguish SRBs from other FSF request types. We can re-use this method in zfcp_fsf_req_complete(). Introduce a helper function to make the check for req->qtcb less magic. SRBs always are FSF_QTCB_UNSOLICITED_STATUS, so we can hard-code this for the two trace functions dealing with SRBs. All other FSF request types have a QTCB and we can get the fsf_command from there. zfcp_dbf_hba_fsf_response() and thus zfcp_dbf_hba_fsf_res() are only called for non-SRB requests so it's safe to dereference the QTCB [zfcp_fsf_req_complete() returns early on SRB, else calls zfcp_fsf_protstatus_eval() which calls zfcp_dbf_hba_fsf_response()]. In zfcp_scsi_forget_cmnd() we guard the QTCB dereference with a preceding NULL check and rely on boolean shortcut evaluation. As a side effect, this causes an alignment hole which we can close in a later patch after having cleaned up all fields of struct zfcp_fsf_req. Before: $ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko ... u32 status; /* 136 4 */ u32 fsf_command; /* 140 4 */ struct fsf_qtcb * qtcb; /* 144 8 */ ... After: $ pahole -C zfcp_fsf_req drivers/s390/scsi/zfcp.ko ... u32 status; /* 136 4 */ /* XXX 4 bytes hole, try to pack */ struct fsf_qtcb * qtcb; /* 144 8 */ ... Signed-off-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-08 14:44:45 +00:00
req->qtcb->header.fsf_command = fsf_cmd;
}
zfcp_qdio_req_init(adapter->qdio, &req->qdio_req, req->req_id, sbtype,
req->qtcb, sizeof(struct fsf_qtcb));
return req;
}
static int zfcp_fsf_req_send(struct zfcp_fsf_req *req)
{
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
const bool is_srb = zfcp_fsf_req_is_status_read_buffer(req);
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_qdio *qdio = adapter->qdio;
int req_id = req->req_id;
zfcp_reqlist_add(adapter->req_list, req);
req->qdio_req.qdio_outb_usage = atomic_read(&qdio->req_q_free);
req->issued = get_tod_clock();
if (zfcp_qdio_send(qdio, &req->qdio_req)) {
del_timer(&req->timer);
/* lookup request again, list might have changed */
zfcp_reqlist_find_rm(adapter->req_list, req_id);
zfcp_erp_adapter_reopen(adapter, 0, "fsrs__1");
return -EIO;
}
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/*
* NOTE: DO NOT TOUCH ASYNC req PAST THIS POINT.
* ONLY TOUCH SYNC req AGAIN ON req->completion.
*
* The request might complete and be freed concurrently at any point
* now. This is not protected by the QDIO-lock (req_q_lock). So any
* uncontrolled access after this might result in an use-after-free bug.
* Only if the request doesn't have ZFCP_STATUS_FSFREQ_CLEANUP set, and
* when it is completed via req->completion, is it safe to use req
* again.
*/
/* Don't increase for unsolicited status */
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
if (!is_srb)
adapter->fsf_req_seq_no++;
adapter->req_no++;
return 0;
}
/**
* zfcp_fsf_status_read - send status read request
* @qdio: pointer to struct zfcp_qdio
* Returns: 0 on success, ERROR otherwise
*/
int zfcp_fsf_status_read(struct zfcp_qdio *qdio)
{
struct zfcp_adapter *adapter = qdio->adapter;
struct zfcp_fsf_req *req;
struct fsf_status_read_buffer *sr_buf;
struct page *page;
int retval = -EIO;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_UNSOLICITED_STATUS,
SBAL_SFLAGS0_TYPE_STATUS,
adapter->pool.status_read_req);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out;
}
page = mempool_alloc(adapter->pool.sr_data, GFP_ATOMIC);
if (!page) {
retval = -ENOMEM;
goto failed_buf;
}
sr_buf = page_address(page);
memset(sr_buf, 0, sizeof(*sr_buf));
req->data = sr_buf;
zfcp_qdio_fill_next(qdio, &req->qdio_req, sr_buf, sizeof(*sr_buf));
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
retval = zfcp_fsf_req_send(req);
if (retval)
goto failed_req_send;
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
goto out;
failed_req_send:
req->data = NULL;
mempool_free(virt_to_page(sr_buf), adapter->pool.sr_data);
failed_buf:
zfcp_dbf_hba_fsf_uss("fssr__1", req);
zfcp_fsf_req_free(req);
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return retval;
}
static void zfcp_fsf_abort_fcp_command_handler(struct zfcp_fsf_req *req)
{
struct scsi_device *sdev = req->data;
2012-09-04 13:23:36 +00:00
struct zfcp_scsi_dev *zfcp_sdev;
union fsf_status_qual *fsq = &req->qtcb->header.fsf_status_qual;
if (req->status & ZFCP_STATUS_FSFREQ_ERROR)
return;
2012-09-04 13:23:36 +00:00
zfcp_sdev = sdev_to_zfcp(sdev);
switch (req->qtcb->header.fsf_status) {
case FSF_PORT_HANDLE_NOT_VALID:
if (fsq->word[0] == fsq->word[1]) {
zfcp_erp_adapter_reopen(zfcp_sdev->port->adapter, 0,
"fsafch1");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
}
break;
case FSF_LUN_HANDLE_NOT_VALID:
if (fsq->word[0] == fsq->word[1]) {
zfcp_erp_port_reopen(zfcp_sdev->port, 0, "fsafch2");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
}
break;
case FSF_FCP_COMMAND_DOES_NOT_EXIST:
req->status |= ZFCP_STATUS_FSFREQ_ABORTNOTNEEDED;
break;
case FSF_PORT_BOXED:
zfcp_erp_set_port_status(zfcp_sdev->port,
ZFCP_STATUS_COMMON_ACCESS_BOXED);
zfcp_erp_port_reopen(zfcp_sdev->port,
ZFCP_STATUS_COMMON_ERP_FAILED, "fsafch3");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_LUN_BOXED:
zfcp_erp_set_lun_status(sdev, ZFCP_STATUS_COMMON_ACCESS_BOXED);
zfcp_erp_lun_reopen(sdev, ZFCP_STATUS_COMMON_ERP_FAILED,
"fsafch4");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_ADAPTER_STATUS_AVAILABLE:
switch (fsq->word[0]) {
case FSF_SQ_INVOKE_LINK_TEST_PROCEDURE:
zfcp_fc_test_link(zfcp_sdev->port);
/* fall through */
case FSF_SQ_ULP_DEPENDENT_ERP_REQUIRED:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
}
break;
case FSF_GOOD:
req->status |= ZFCP_STATUS_FSFREQ_ABORTSUCCEEDED;
break;
}
}
/**
* zfcp_fsf_abort_fcp_cmnd - abort running SCSI command
* @scmnd: The SCSI command to abort
* Returns: pointer to struct zfcp_fsf_req
*/
struct zfcp_fsf_req *zfcp_fsf_abort_fcp_cmnd(struct scsi_cmnd *scmnd)
{
struct zfcp_fsf_req *req = NULL;
struct scsi_device *sdev = scmnd->device;
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
struct zfcp_qdio *qdio = zfcp_sdev->port->adapter->qdio;
unsigned long old_req_id = (unsigned long) scmnd->host_scribble;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_ABORT_FCP_CMND,
SBAL_SFLAGS0_TYPE_READ,
qdio->adapter->pool.scsi_abort);
if (IS_ERR(req)) {
req = NULL;
goto out;
}
if (unlikely(!(atomic_read(&zfcp_sdev->status) &
ZFCP_STATUS_COMMON_UNBLOCKED)))
goto out_error_free;
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
req->data = sdev;
req->handler = zfcp_fsf_abort_fcp_command_handler;
req->qtcb->header.lun_handle = zfcp_sdev->lun_handle;
req->qtcb->header.port_handle = zfcp_sdev->port->handle;
req->qtcb->bottom.support.req_handle = (u64) old_req_id;
zfcp_fsf_start_timer(req, ZFCP_FSF_SCSI_ER_TIMEOUT);
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
if (!zfcp_fsf_req_send(req)) {
/* NOTE: DO NOT TOUCH req, UNTIL IT COMPLETES! */
goto out;
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
}
out_error_free:
zfcp_fsf_req_free(req);
req = NULL;
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return req;
}
static void zfcp_fsf_send_ct_handler(struct zfcp_fsf_req *req)
{
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_fsf_ct_els *ct = req->data;
struct fsf_qtcb_header *header = &req->qtcb->header;
ct->status = -EINVAL;
if (req->status & ZFCP_STATUS_FSFREQ_ERROR)
goto skip_fsfstatus;
switch (header->fsf_status) {
case FSF_GOOD:
ct->status = 0;
scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records v4.9 commit aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") fixed trace data loss of 2.6.38 commit 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") necessary for problem determination, e.g. to see the currently active zone set during automatic port scan. While it already saves space by not dumping any empty residual entries of the large successful GPN_FT response (4 pages), there are seldom cases where the GPN_FT response is unsuccessful and likely does not have FC_NS_FID_LAST set in fp_flags so we did not cap the trace record. We typically see such case for an initiator WWPN, which is not in any zone. Cap unsuccessful responses to at least the actual basic CT_IU response plus whatever fits the SAN trace record built-in "payload" buffer just in case there's trailing information of which we would at least see the existence and its beginning. In order not to erroneously cap successful responses, we need to swap calling the trace function and setting the CT / ELS status to success (0). Example trace record pair formatted with zfcpdbf: Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : fssct_1 Request ID : 0x<request_id> Destination ID : 0x00fffffc SAN req short : 01000000 fc020000 01720ffc 00000000 00000008 SAN req length : 20 | Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 2 Tag : fsscth2 Request ID : 0x<request_id> Destination ID : 0x00fffffc SAN resp short : 01000000 fc020000 80010000 00090700 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] SAN resp length: 16384 San resp info : 01000000 fc020000 80010000 00090700 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] The fix saves all but one of the previously associated 64 PAYload trace record chunks of size 256 bytes each. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-28 10:30:53 +00:00
zfcp_dbf_san_res("fsscth2", req);
break;
case FSF_SERVICE_CLASS_NOT_SUPPORTED:
zfcp_fsf_class_not_supp(req);
break;
case FSF_ADAPTER_STATUS_AVAILABLE:
switch (header->fsf_status_qual.word[0]){
case FSF_SQ_INVOKE_LINK_TEST_PROCEDURE:
case FSF_SQ_ULP_DEPENDENT_ERP_REQUIRED:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
}
break;
case FSF_PORT_BOXED:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_PORT_HANDLE_NOT_VALID:
zfcp_erp_adapter_reopen(adapter, 0, "fsscth1");
/* fall through */
case FSF_GENERIC_COMMAND_REJECTED:
case FSF_PAYLOAD_SIZE_MISMATCH:
case FSF_REQUEST_SIZE_TOO_LARGE:
case FSF_RESPONSE_SIZE_TOO_LARGE:
case FSF_SBAL_MISMATCH:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
}
skip_fsfstatus:
if (ct->handler)
ct->handler(ct->handler_data);
}
static void zfcp_fsf_setup_ct_els_unchained(struct zfcp_qdio *qdio,
struct zfcp_qdio_req *q_req,
struct scatterlist *sg_req,
struct scatterlist *sg_resp)
{
zfcp_qdio_fill_next(qdio, q_req, sg_virt(sg_req), sg_req->length);
zfcp_qdio_fill_next(qdio, q_req, sg_virt(sg_resp), sg_resp->length);
zfcp_qdio_set_sbale_last(qdio, q_req);
}
static int zfcp_fsf_setup_ct_els_sbals(struct zfcp_fsf_req *req,
struct scatterlist *sg_req,
struct scatterlist *sg_resp)
{
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_qdio *qdio = adapter->qdio;
struct fsf_qtcb *qtcb = req->qtcb;
u32 feat = adapter->adapter_features;
if (zfcp_adapter_multi_buffer_active(adapter)) {
if (zfcp_qdio_sbals_from_sg(qdio, &req->qdio_req, sg_req))
return -EIO;
zfcp: fix ELS/GS request&response length for hardware data router In the hardware data router case, introduced with kernel 3.2 commit 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router") the ELS/GS request&response length needs to be initialized as in the chained SBAL case. Otherwise, the FCP channel rejects ELS requests with FSF_REQUEST_SIZE_TOO_LARGE. Such ELS requests can be issued by user space through BSG / HBA API, or zfcp itself uses ADISC ELS for remote port link test on RSCN. The latter can cause a short path outage due to unnecessary remote target port recovery because the always failing ADISC cannot detect extremely short path interruptions beyond the local FCP channel. Below example is decoded with zfcpdbf from s390-tools: Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU id : .. Caller : zfcp_dbf_san_req+0408 Record id : 1 Tag : fssels1 Request id : 0x<reqid> Destination ID : 0x00<target d_id> Payload info : 52000000 00000000 <our wwpn > [ADISC] <our wwnn > 00<s_id> 00000000 00000000 00000000 00000000 00000000 Timestamp : ... Area : HBA Subarea : 00 Level : 1 Exception : - CPU id : .. Caller : zfcp_dbf_hba_fsf_res+0740 Record id : 1 Tag : fs_ferr Request id : 0x<reqid> Request status : 0x00000010 FSF cmnd : 0x0000000b [FSF_QTCB_SEND_ELS] FSF sequence no: 0x... FSF issued : ... FSF stat : 0x00000061 [FSF_REQUEST_SIZE_TOO_LARGE] FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000100 Prot stat qual : 00000000 00000000 00000000 00000000 Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router") Cc: <stable@vger.kernel.org> # 3.2+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-08-10 16:30:45 +00:00
qtcb->bottom.support.req_buf_length =
zfcp_qdio_real_bytes(sg_req);
if (zfcp_qdio_sbals_from_sg(qdio, &req->qdio_req, sg_resp))
return -EIO;
zfcp: fix ELS/GS request&response length for hardware data router In the hardware data router case, introduced with kernel 3.2 commit 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router") the ELS/GS request&response length needs to be initialized as in the chained SBAL case. Otherwise, the FCP channel rejects ELS requests with FSF_REQUEST_SIZE_TOO_LARGE. Such ELS requests can be issued by user space through BSG / HBA API, or zfcp itself uses ADISC ELS for remote port link test on RSCN. The latter can cause a short path outage due to unnecessary remote target port recovery because the always failing ADISC cannot detect extremely short path interruptions beyond the local FCP channel. Below example is decoded with zfcpdbf from s390-tools: Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU id : .. Caller : zfcp_dbf_san_req+0408 Record id : 1 Tag : fssels1 Request id : 0x<reqid> Destination ID : 0x00<target d_id> Payload info : 52000000 00000000 <our wwpn > [ADISC] <our wwnn > 00<s_id> 00000000 00000000 00000000 00000000 00000000 Timestamp : ... Area : HBA Subarea : 00 Level : 1 Exception : - CPU id : .. Caller : zfcp_dbf_hba_fsf_res+0740 Record id : 1 Tag : fs_ferr Request id : 0x<reqid> Request status : 0x00000010 FSF cmnd : 0x0000000b [FSF_QTCB_SEND_ELS] FSF sequence no: 0x... FSF issued : ... FSF stat : 0x00000061 [FSF_REQUEST_SIZE_TOO_LARGE] FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000100 Prot stat qual : 00000000 00000000 00000000 00000000 Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 86a9668a8d29 ("[SCSI] zfcp: support for hardware data router") Cc: <stable@vger.kernel.org> # 3.2+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-08-10 16:30:45 +00:00
qtcb->bottom.support.resp_buf_length =
zfcp_qdio_real_bytes(sg_resp);
zfcp_qdio_set_data_div(qdio, &req->qdio_req, sg_nents(sg_req));
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
zfcp_qdio_set_scount(qdio, &req->qdio_req);
return 0;
}
/* use single, unchained SBAL if it can hold the request */
if (zfcp_qdio_sg_one_sbale(sg_req) && zfcp_qdio_sg_one_sbale(sg_resp)) {
zfcp_fsf_setup_ct_els_unchained(qdio, &req->qdio_req,
sg_req, sg_resp);
return 0;
}
if (!(feat & FSF_FEATURE_ELS_CT_CHAINED_SBALS))
return -EOPNOTSUPP;
if (zfcp_qdio_sbals_from_sg(qdio, &req->qdio_req, sg_req))
return -EIO;
qtcb->bottom.support.req_buf_length = zfcp_qdio_real_bytes(sg_req);
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
zfcp_qdio_skip_to_last_sbale(qdio, &req->qdio_req);
if (zfcp_qdio_sbals_from_sg(qdio, &req->qdio_req, sg_resp))
return -EIO;
qtcb->bottom.support.resp_buf_length = zfcp_qdio_real_bytes(sg_resp);
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
return 0;
}
static int zfcp_fsf_setup_ct_els(struct zfcp_fsf_req *req,
struct scatterlist *sg_req,
struct scatterlist *sg_resp,
unsigned int timeout)
{
int ret;
ret = zfcp_fsf_setup_ct_els_sbals(req, sg_req, sg_resp);
if (ret)
return ret;
/* common settings for ct/gs and els requests */
if (timeout > 255)
timeout = 255; /* max value accepted by hardware */
req->qtcb->bottom.support.service_class = FSF_CLASS_3;
req->qtcb->bottom.support.timeout = timeout;
zfcp_fsf_start_timer(req, (timeout + 10) * HZ);
return 0;
}
/**
* zfcp_fsf_send_ct - initiate a Generic Service request (FC-GS)
* @wka_port: pointer to zfcp WKA port to send CT/GS to
* @ct: pointer to struct zfcp_send_ct with data for request
* @pool: if non-null this mempool is used to allocate struct zfcp_fsf_req
* @timeout: timeout that hardware should use, and a later software timeout
*/
int zfcp_fsf_send_ct(struct zfcp_fc_wka_port *wka_port,
struct zfcp_fsf_ct_els *ct, mempool_t *pool,
unsigned int timeout)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
int ret = -EIO;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_SEND_GENERIC,
SBAL_SFLAGS0_TYPE_WRITE_READ, pool);
if (IS_ERR(req)) {
ret = PTR_ERR(req);
goto out;
}
req->status |= ZFCP_STATUS_FSFREQ_CLEANUP;
ret = zfcp_fsf_setup_ct_els(req, ct->req, ct->resp, timeout);
if (ret)
goto failed_send;
req->handler = zfcp_fsf_send_ct_handler;
req->qtcb->header.port_handle = wka_port->handle;
ct->d_id = wka_port->d_id;
req->data = ct;
zfcp_dbf_san_req("fssct_1", req, wka_port->d_id);
ret = zfcp_fsf_req_send(req);
if (ret)
goto failed_send;
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
goto out;
failed_send:
zfcp_fsf_req_free(req);
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return ret;
}
static void zfcp_fsf_send_els_handler(struct zfcp_fsf_req *req)
{
struct zfcp_fsf_ct_els *send_els = req->data;
struct fsf_qtcb_header *header = &req->qtcb->header;
send_els->status = -EINVAL;
if (req->status & ZFCP_STATUS_FSFREQ_ERROR)
goto skip_fsfstatus;
switch (header->fsf_status) {
case FSF_GOOD:
send_els->status = 0;
scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records v4.9 commit aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") fixed trace data loss of 2.6.38 commit 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") necessary for problem determination, e.g. to see the currently active zone set during automatic port scan. While it already saves space by not dumping any empty residual entries of the large successful GPN_FT response (4 pages), there are seldom cases where the GPN_FT response is unsuccessful and likely does not have FC_NS_FID_LAST set in fp_flags so we did not cap the trace record. We typically see such case for an initiator WWPN, which is not in any zone. Cap unsuccessful responses to at least the actual basic CT_IU response plus whatever fits the SAN trace record built-in "payload" buffer just in case there's trailing information of which we would at least see the existence and its beginning. In order not to erroneously cap successful responses, we need to swap calling the trace function and setting the CT / ELS status to success (0). Example trace record pair formatted with zfcpdbf: Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : fssct_1 Request ID : 0x<request_id> Destination ID : 0x00fffffc SAN req short : 01000000 fc020000 01720ffc 00000000 00000008 SAN req length : 20 | Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 2 Tag : fsscth2 Request ID : 0x<request_id> Destination ID : 0x00fffffc SAN resp short : 01000000 fc020000 80010000 00090700 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] SAN resp length: 16384 San resp info : 01000000 fc020000 80010000 00090700 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] The fix saves all but one of the previously associated 64 PAYload trace record chunks of size 256 bytes each. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-28 10:30:53 +00:00
zfcp_dbf_san_res("fsselh1", req);
break;
case FSF_SERVICE_CLASS_NOT_SUPPORTED:
zfcp_fsf_class_not_supp(req);
break;
case FSF_ADAPTER_STATUS_AVAILABLE:
switch (header->fsf_status_qual.word[0]){
case FSF_SQ_INVOKE_LINK_TEST_PROCEDURE:
case FSF_SQ_ULP_DEPENDENT_ERP_REQUIRED:
case FSF_SQ_RETRY_IF_POSSIBLE:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
}
break;
case FSF_ELS_COMMAND_REJECTED:
case FSF_PAYLOAD_SIZE_MISMATCH:
case FSF_REQUEST_SIZE_TOO_LARGE:
case FSF_RESPONSE_SIZE_TOO_LARGE:
break;
case FSF_SBAL_MISMATCH:
/* should never occur, avoided in zfcp_fsf_send_els */
/* fall through */
default:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
}
skip_fsfstatus:
if (send_els->handler)
send_els->handler(send_els->handler_data);
}
/**
* zfcp_fsf_send_els - initiate an ELS command (FC-FS)
* @adapter: pointer to zfcp adapter
* @d_id: N_Port_ID to send ELS to
* @els: pointer to struct zfcp_send_els with data for the command
* @timeout: timeout that hardware should use, and a later software timeout
*/
int zfcp_fsf_send_els(struct zfcp_adapter *adapter, u32 d_id,
struct zfcp_fsf_ct_els *els, unsigned int timeout)
{
struct zfcp_fsf_req *req;
struct zfcp_qdio *qdio = adapter->qdio;
int ret = -EIO;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_SEND_ELS,
SBAL_SFLAGS0_TYPE_WRITE_READ, NULL);
if (IS_ERR(req)) {
ret = PTR_ERR(req);
goto out;
}
req->status |= ZFCP_STATUS_FSFREQ_CLEANUP;
if (!zfcp_adapter_multi_buffer_active(adapter))
zfcp_qdio_sbal_limit(qdio, &req->qdio_req, 2);
ret = zfcp_fsf_setup_ct_els(req, els->req, els->resp, timeout);
if (ret)
goto failed_send;
hton24(req->qtcb->bottom.support.d_id, d_id);
req->handler = zfcp_fsf_send_els_handler;
els->d_id = d_id;
req->data = els;
zfcp_dbf_san_req("fssels1", req, d_id);
ret = zfcp_fsf_req_send(req);
if (ret)
goto failed_send;
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
goto out;
failed_send:
zfcp_fsf_req_free(req);
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return ret;
}
int zfcp_fsf_exchange_config_data(struct zfcp_erp_action *erp_action)
{
struct zfcp_fsf_req *req;
struct zfcp_qdio *qdio = erp_action->adapter->qdio;
int retval = -EIO;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_EXCHANGE_CONFIG_DATA,
SBAL_SFLAGS0_TYPE_READ,
qdio->adapter->pool.erp_req);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out;
}
req->status |= ZFCP_STATUS_FSFREQ_CLEANUP;
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
req->qtcb->bottom.config.feature_selection =
FSF_FEATURE_NOTIFICATION_LOST |
FSF_FEATURE_UPDATE_ALERT |
FSF_FEATURE_REQUEST_SFP_DATA;
req->erp_action = erp_action;
req->handler = zfcp_fsf_exchange_config_data_handler;
erp_action->fsf_req_id = req->req_id;
zfcp_fsf_start_erp_timer(req);
retval = zfcp_fsf_req_send(req);
if (retval) {
zfcp_fsf_req_free(req);
erp_action->fsf_req_id = 0;
}
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return retval;
}
scsi: zfcp: signal incomplete or error for sync exchange config/port data Adds a new FSF-Request status flag (ZFCP_STATUS_FSFREQ_XDATAINCOMPLETE) that signal that the data received using Exchange Config Data or Exchange Port Data was incomplete. This new flags is set in the respective handlers during the response path. With this patch, only the synchronous FSF-functions for each command got support for the new flag, otherwise it is transparent. Together with this new flag and already existing status flags the synchronous FSF-functions are extended to now detect whether the received data is complete, incomplete or completely invalid (this includes cases where a command ran into a timeout). This is now signaled back to the caller, where previously only failures on the request path would result in a bad return-code. For complete data the return-code remains 0. For incomplete data a new return-code -EAGAIN is added to the function-interface. For completely invalid data the already existing return-code -EIO is reused - formerly this was used to signal failures on the request path. Existing callers of the FSF-functions are adjusted so that they behave as before for return-code 0 and -EAGAIN, to not change the user-interface. As -EIO existed all along, it was already exposed to the user - and needed handling - and will now also be exposed in this new special case. Link: https://lore.kernel.org/r/e14f0702fa2b00a4d1f37c7981a13f2dd1ea2c83.1572018130.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-25 16:12:43 +00:00
/**
* zfcp_fsf_exchange_config_data_sync() - Request information about FCP channel.
* @qdio: pointer to the QDIO-Queue to use for sending the command.
* @data: pointer to the QTCB-Bottom for storing the result of the command,
* might be %NULL.
*
* Returns:
* * 0 - Exchange Config Data was successful, @data is complete
* * -EIO - Exchange Config Data was not successful, @data is invalid
* * -EAGAIN - @data contains incomplete data
* * -ENOMEM - Some memory allocation failed along the way
*/
int zfcp_fsf_exchange_config_data_sync(struct zfcp_qdio *qdio,
struct fsf_qtcb_bottom_config *data)
{
struct zfcp_fsf_req *req = NULL;
int retval = -EIO;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out_unlock;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_EXCHANGE_CONFIG_DATA,
SBAL_SFLAGS0_TYPE_READ, NULL);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out_unlock;
}
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
req->handler = zfcp_fsf_exchange_config_data_handler;
req->qtcb->bottom.config.feature_selection =
FSF_FEATURE_NOTIFICATION_LOST |
FSF_FEATURE_UPDATE_ALERT |
FSF_FEATURE_REQUEST_SFP_DATA;
if (data)
req->data = data;
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
scsi: zfcp: signal incomplete or error for sync exchange config/port data Adds a new FSF-Request status flag (ZFCP_STATUS_FSFREQ_XDATAINCOMPLETE) that signal that the data received using Exchange Config Data or Exchange Port Data was incomplete. This new flags is set in the respective handlers during the response path. With this patch, only the synchronous FSF-functions for each command got support for the new flag, otherwise it is transparent. Together with this new flag and already existing status flags the synchronous FSF-functions are extended to now detect whether the received data is complete, incomplete or completely invalid (this includes cases where a command ran into a timeout). This is now signaled back to the caller, where previously only failures on the request path would result in a bad return-code. For complete data the return-code remains 0. For incomplete data a new return-code -EAGAIN is added to the function-interface. For completely invalid data the already existing return-code -EIO is reused - formerly this was used to signal failures on the request path. Existing callers of the FSF-functions are adjusted so that they behave as before for return-code 0 and -EAGAIN, to not change the user-interface. As -EIO existed all along, it was already exposed to the user - and needed handling - and will now also be exposed in this new special case. Link: https://lore.kernel.org/r/e14f0702fa2b00a4d1f37c7981a13f2dd1ea2c83.1572018130.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-25 16:12:43 +00:00
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
if (!retval) {
/* NOTE: ONLY TOUCH SYNC req AGAIN ON req->completion. */
wait_for_completion(&req->completion);
scsi: zfcp: signal incomplete or error for sync exchange config/port data Adds a new FSF-Request status flag (ZFCP_STATUS_FSFREQ_XDATAINCOMPLETE) that signal that the data received using Exchange Config Data or Exchange Port Data was incomplete. This new flags is set in the respective handlers during the response path. With this patch, only the synchronous FSF-functions for each command got support for the new flag, otherwise it is transparent. Together with this new flag and already existing status flags the synchronous FSF-functions are extended to now detect whether the received data is complete, incomplete or completely invalid (this includes cases where a command ran into a timeout). This is now signaled back to the caller, where previously only failures on the request path would result in a bad return-code. For complete data the return-code remains 0. For incomplete data a new return-code -EAGAIN is added to the function-interface. For completely invalid data the already existing return-code -EIO is reused - formerly this was used to signal failures on the request path. Existing callers of the FSF-functions are adjusted so that they behave as before for return-code 0 and -EAGAIN, to not change the user-interface. As -EIO existed all along, it was already exposed to the user - and needed handling - and will now also be exposed in this new special case. Link: https://lore.kernel.org/r/e14f0702fa2b00a4d1f37c7981a13f2dd1ea2c83.1572018130.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-25 16:12:43 +00:00
if (req->status &
(ZFCP_STATUS_FSFREQ_ERROR | ZFCP_STATUS_FSFREQ_DISMISSED))
retval = -EIO;
else if (req->status & ZFCP_STATUS_FSFREQ_XDATAINCOMPLETE)
retval = -EAGAIN;
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
}
zfcp_fsf_req_free(req);
return retval;
out_unlock:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return retval;
}
/**
* zfcp_fsf_exchange_port_data - request information about local port
* @erp_action: ERP action for the adapter for which port data is requested
* Returns: 0 on success, error otherwise
*/
int zfcp_fsf_exchange_port_data(struct zfcp_erp_action *erp_action)
{
struct zfcp_qdio *qdio = erp_action->adapter->qdio;
struct zfcp_fsf_req *req;
int retval = -EIO;
if (!(qdio->adapter->adapter_features & FSF_FEATURE_HBAAPI_MANAGEMENT))
return -EOPNOTSUPP;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_EXCHANGE_PORT_DATA,
SBAL_SFLAGS0_TYPE_READ,
qdio->adapter->pool.erp_req);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out;
}
req->status |= ZFCP_STATUS_FSFREQ_CLEANUP;
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
req->handler = zfcp_fsf_exchange_port_data_handler;
req->erp_action = erp_action;
erp_action->fsf_req_id = req->req_id;
zfcp_fsf_start_erp_timer(req);
retval = zfcp_fsf_req_send(req);
if (retval) {
zfcp_fsf_req_free(req);
erp_action->fsf_req_id = 0;
}
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return retval;
}
/**
scsi: zfcp: signal incomplete or error for sync exchange config/port data Adds a new FSF-Request status flag (ZFCP_STATUS_FSFREQ_XDATAINCOMPLETE) that signal that the data received using Exchange Config Data or Exchange Port Data was incomplete. This new flags is set in the respective handlers during the response path. With this patch, only the synchronous FSF-functions for each command got support for the new flag, otherwise it is transparent. Together with this new flag and already existing status flags the synchronous FSF-functions are extended to now detect whether the received data is complete, incomplete or completely invalid (this includes cases where a command ran into a timeout). This is now signaled back to the caller, where previously only failures on the request path would result in a bad return-code. For complete data the return-code remains 0. For incomplete data a new return-code -EAGAIN is added to the function-interface. For completely invalid data the already existing return-code -EIO is reused - formerly this was used to signal failures on the request path. Existing callers of the FSF-functions are adjusted so that they behave as before for return-code 0 and -EAGAIN, to not change the user-interface. As -EIO existed all along, it was already exposed to the user - and needed handling - and will now also be exposed in this new special case. Link: https://lore.kernel.org/r/e14f0702fa2b00a4d1f37c7981a13f2dd1ea2c83.1572018130.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-25 16:12:43 +00:00
* zfcp_fsf_exchange_port_data_sync() - Request information about local port.
* @qdio: pointer to the QDIO-Queue to use for sending the command.
* @data: pointer to the QTCB-Bottom for storing the result of the command,
* might be %NULL.
*
* Returns:
* * 0 - Exchange Port Data was successful, @data is complete
* * -EIO - Exchange Port Data was not successful, @data is invalid
* * -EAGAIN - @data contains incomplete data
* * -ENOMEM - Some memory allocation failed along the way
* * -EOPNOTSUPP - This operation is not supported
*/
int zfcp_fsf_exchange_port_data_sync(struct zfcp_qdio *qdio,
struct fsf_qtcb_bottom_port *data)
{
struct zfcp_fsf_req *req = NULL;
int retval = -EIO;
if (!(qdio->adapter->adapter_features & FSF_FEATURE_HBAAPI_MANAGEMENT))
return -EOPNOTSUPP;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out_unlock;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_EXCHANGE_PORT_DATA,
SBAL_SFLAGS0_TYPE_READ, NULL);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out_unlock;
}
if (data)
req->data = data;
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
req->handler = zfcp_fsf_exchange_port_data_handler;
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
if (!retval) {
/* NOTE: ONLY TOUCH SYNC req AGAIN ON req->completion. */
wait_for_completion(&req->completion);
scsi: zfcp: signal incomplete or error for sync exchange config/port data Adds a new FSF-Request status flag (ZFCP_STATUS_FSFREQ_XDATAINCOMPLETE) that signal that the data received using Exchange Config Data or Exchange Port Data was incomplete. This new flags is set in the respective handlers during the response path. With this patch, only the synchronous FSF-functions for each command got support for the new flag, otherwise it is transparent. Together with this new flag and already existing status flags the synchronous FSF-functions are extended to now detect whether the received data is complete, incomplete or completely invalid (this includes cases where a command ran into a timeout). This is now signaled back to the caller, where previously only failures on the request path would result in a bad return-code. For complete data the return-code remains 0. For incomplete data a new return-code -EAGAIN is added to the function-interface. For completely invalid data the already existing return-code -EIO is reused - formerly this was used to signal failures on the request path. Existing callers of the FSF-functions are adjusted so that they behave as before for return-code 0 and -EAGAIN, to not change the user-interface. As -EIO existed all along, it was already exposed to the user - and needed handling - and will now also be exposed in this new special case. Link: https://lore.kernel.org/r/e14f0702fa2b00a4d1f37c7981a13f2dd1ea2c83.1572018130.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-25 16:12:43 +00:00
if (req->status &
(ZFCP_STATUS_FSFREQ_ERROR | ZFCP_STATUS_FSFREQ_DISMISSED))
retval = -EIO;
else if (req->status & ZFCP_STATUS_FSFREQ_XDATAINCOMPLETE)
retval = -EAGAIN;
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
}
zfcp_fsf_req_free(req);
return retval;
out_unlock:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return retval;
}
static void zfcp_fsf_open_port_handler(struct zfcp_fsf_req *req)
{
struct zfcp_port *port = req->data;
struct fsf_qtcb_header *header = &req->qtcb->header;
struct fc_els_flogi *plogi;
if (req->status & ZFCP_STATUS_FSFREQ_ERROR)
goto out;
switch (header->fsf_status) {
case FSF_PORT_ALREADY_OPEN:
break;
case FSF_MAXIMUM_NUMBER_OF_PORTS_EXCEEDED:
dev_warn(&req->adapter->ccw_device->dev,
"Not enough FCP adapter resources to open "
"remote port 0x%016Lx\n",
(unsigned long long)port->wwpn);
zfcp_erp_set_port_status(port,
ZFCP_STATUS_COMMON_ERP_FAILED);
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_ADAPTER_STATUS_AVAILABLE:
switch (header->fsf_status_qual.word[0]) {
case FSF_SQ_INVOKE_LINK_TEST_PROCEDURE:
/* no zfcp_fc_test_link() with failed open port */
/* fall through */
case FSF_SQ_ULP_DEPENDENT_ERP_REQUIRED:
case FSF_SQ_NO_RETRY_POSSIBLE:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
}
break;
case FSF_GOOD:
port->handle = header->port_handle;
atomic_or(ZFCP_STATUS_COMMON_OPEN |
ZFCP_STATUS_PORT_PHYS_OPEN, &port->status);
atomic_andnot(ZFCP_STATUS_COMMON_ACCESS_BOXED,
&port->status);
/* check whether D_ID has changed during open */
/*
* FIXME: This check is not airtight, as the FCP channel does
* not monitor closures of target port connections caused on
* the remote side. Thus, they might miss out on invalidating
* locally cached WWPNs (and other N_Port parameters) of gone
* target ports. So, our heroic attempt to make things safe
* could be undermined by 'open port' response data tagged with
* obsolete WWPNs. Another reason to monitor potential
* connection closures ourself at least (by interpreting
* incoming ELS' and unsolicited status). It just crosses my
* mind that one should be able to cross-check by means of
* another GID_PN straight after a port has been opened.
* Alternately, an ADISC/PDISC ELS should suffice, as well.
*/
plogi = (struct fc_els_flogi *) req->qtcb->bottom.support.els;
if (req->qtcb->bottom.support.els1_length >=
FSF_PLOGI_MIN_LEN)
zfcp_fc_plogi_evaluate(port, plogi);
break;
case FSF_UNKNOWN_OP_SUBTYPE:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
}
out:
put_device(&port->dev);
}
/**
* zfcp_fsf_open_port - create and send open port request
* @erp_action: pointer to struct zfcp_erp_action
* Returns: 0 on success, error otherwise
*/
int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action)
{
struct zfcp_qdio *qdio = erp_action->adapter->qdio;
struct zfcp_port *port = erp_action->port;
struct zfcp_fsf_req *req;
int retval = -EIO;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID,
SBAL_SFLAGS0_TYPE_READ,
qdio->adapter->pool.erp_req);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out;
}
req->status |= ZFCP_STATUS_FSFREQ_CLEANUP;
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
req->handler = zfcp_fsf_open_port_handler;
hton24(req->qtcb->bottom.support.d_id, port->d_id);
req->data = port;
req->erp_action = erp_action;
erp_action->fsf_req_id = req->req_id;
get_device(&port->dev);
zfcp_fsf_start_erp_timer(req);
retval = zfcp_fsf_req_send(req);
if (retval) {
zfcp_fsf_req_free(req);
erp_action->fsf_req_id = 0;
put_device(&port->dev);
}
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return retval;
}
static void zfcp_fsf_close_port_handler(struct zfcp_fsf_req *req)
{
struct zfcp_port *port = req->data;
if (req->status & ZFCP_STATUS_FSFREQ_ERROR)
return;
switch (req->qtcb->header.fsf_status) {
case FSF_PORT_HANDLE_NOT_VALID:
zfcp_erp_adapter_reopen(port->adapter, 0, "fscph_1");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_ADAPTER_STATUS_AVAILABLE:
break;
case FSF_GOOD:
zfcp_erp_clear_port_status(port, ZFCP_STATUS_COMMON_OPEN);
break;
}
}
/**
* zfcp_fsf_close_port - create and send close port request
* @erp_action: pointer to struct zfcp_erp_action
* Returns: 0 on success, error otherwise
*/
int zfcp_fsf_close_port(struct zfcp_erp_action *erp_action)
{
struct zfcp_qdio *qdio = erp_action->adapter->qdio;
struct zfcp_fsf_req *req;
int retval = -EIO;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_CLOSE_PORT,
SBAL_SFLAGS0_TYPE_READ,
qdio->adapter->pool.erp_req);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out;
}
req->status |= ZFCP_STATUS_FSFREQ_CLEANUP;
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
req->handler = zfcp_fsf_close_port_handler;
req->data = erp_action->port;
req->erp_action = erp_action;
req->qtcb->header.port_handle = erp_action->port->handle;
erp_action->fsf_req_id = req->req_id;
zfcp_fsf_start_erp_timer(req);
retval = zfcp_fsf_req_send(req);
if (retval) {
zfcp_fsf_req_free(req);
erp_action->fsf_req_id = 0;
}
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return retval;
}
static void zfcp_fsf_open_wka_port_handler(struct zfcp_fsf_req *req)
{
struct zfcp_fc_wka_port *wka_port = req->data;
struct fsf_qtcb_header *header = &req->qtcb->header;
if (req->status & ZFCP_STATUS_FSFREQ_ERROR) {
wka_port->status = ZFCP_FC_WKA_PORT_OFFLINE;
goto out;
}
switch (header->fsf_status) {
case FSF_MAXIMUM_NUMBER_OF_PORTS_EXCEEDED:
dev_warn(&req->adapter->ccw_device->dev,
"Opening WKA port 0x%x failed\n", wka_port->d_id);
/* fall through */
case FSF_ADAPTER_STATUS_AVAILABLE:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
wka_port->status = ZFCP_FC_WKA_PORT_OFFLINE;
break;
case FSF_GOOD:
wka_port->handle = header->port_handle;
/* fall through */
case FSF_PORT_ALREADY_OPEN:
wka_port->status = ZFCP_FC_WKA_PORT_ONLINE;
}
out:
wake_up(&wka_port->completion_wq);
}
/**
* zfcp_fsf_open_wka_port - create and send open wka-port request
* @wka_port: pointer to struct zfcp_fc_wka_port
* Returns: 0 on success, error otherwise
*/
int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send Dan Carpenter kindly reported: <quote> The patch d27a7cb91960: "zfcp: trace on request for open and close of WKA port" from Aug 10, 2016, leads to the following static checker warning: drivers/s390/scsi/zfcp_fsf.c:1615 zfcp_fsf_open_wka_port() warn: 'req' was already freed. drivers/s390/scsi/zfcp_fsf.c 1609 zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT); 1610 retval = zfcp_fsf_req_send(req); 1611 if (retval) 1612 zfcp_fsf_req_free(req); ^^^ Freed. 1613 out: 1614 spin_unlock_irq(&qdio->req_q_lock); 1615 if (req && !IS_ERR(req)) 1616 zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id); ^^^^^^^^^^^ Use after free. 1617 return retval; 1618 } Same thing for zfcp_fsf_close_wka_port() as well. </quote> Rather than relying on req being NULL (or ERR_PTR) for all cases where we don't want to trace or should not trace, simply check retval which is unconditionally initialized with -EIO != 0 and it can only become 0 on successful retval = zfcp_fsf_req_send(req). With that we can also remove the then again unnecessary unconditional initialization of req which was introduced with that earlier commit. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Suggested-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-08 14:34:22 +00:00
struct zfcp_fsf_req *req;
unsigned long req_id = 0;
int retval = -EIO;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID,
SBAL_SFLAGS0_TYPE_READ,
qdio->adapter->pool.erp_req);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out;
}
req->status |= ZFCP_STATUS_FSFREQ_CLEANUP;
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
req->handler = zfcp_fsf_open_wka_port_handler;
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
req_id = req->req_id;
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
zfcp_fsf_req_free(req);
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send Dan Carpenter kindly reported: <quote> The patch d27a7cb91960: "zfcp: trace on request for open and close of WKA port" from Aug 10, 2016, leads to the following static checker warning: drivers/s390/scsi/zfcp_fsf.c:1615 zfcp_fsf_open_wka_port() warn: 'req' was already freed. drivers/s390/scsi/zfcp_fsf.c 1609 zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT); 1610 retval = zfcp_fsf_req_send(req); 1611 if (retval) 1612 zfcp_fsf_req_free(req); ^^^ Freed. 1613 out: 1614 spin_unlock_irq(&qdio->req_q_lock); 1615 if (req && !IS_ERR(req)) 1616 zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id); ^^^^^^^^^^^ Use after free. 1617 return retval; 1618 } Same thing for zfcp_fsf_close_wka_port() as well. </quote> Rather than relying on req being NULL (or ERR_PTR) for all cases where we don't want to trace or should not trace, simply check retval which is unconditionally initialized with -EIO != 0 and it can only become 0 on successful retval = zfcp_fsf_req_send(req). With that we can also remove the then again unnecessary unconditional initialization of req which was introduced with that earlier commit. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Suggested-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-08 14:34:22 +00:00
if (!retval)
zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
static void zfcp_fsf_close_wka_port_handler(struct zfcp_fsf_req *req)
{
struct zfcp_fc_wka_port *wka_port = req->data;
if (req->qtcb->header.fsf_status == FSF_PORT_HANDLE_NOT_VALID) {
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
zfcp_erp_adapter_reopen(wka_port->adapter, 0, "fscwph1");
}
wka_port->status = ZFCP_FC_WKA_PORT_OFFLINE;
wake_up(&wka_port->completion_wq);
}
/**
* zfcp_fsf_close_wka_port - create and send close wka port request
* @wka_port: WKA port to open
* Returns: 0 on success, error otherwise
*/
int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send Dan Carpenter kindly reported: <quote> The patch d27a7cb91960: "zfcp: trace on request for open and close of WKA port" from Aug 10, 2016, leads to the following static checker warning: drivers/s390/scsi/zfcp_fsf.c:1615 zfcp_fsf_open_wka_port() warn: 'req' was already freed. drivers/s390/scsi/zfcp_fsf.c 1609 zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT); 1610 retval = zfcp_fsf_req_send(req); 1611 if (retval) 1612 zfcp_fsf_req_free(req); ^^^ Freed. 1613 out: 1614 spin_unlock_irq(&qdio->req_q_lock); 1615 if (req && !IS_ERR(req)) 1616 zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id); ^^^^^^^^^^^ Use after free. 1617 return retval; 1618 } Same thing for zfcp_fsf_close_wka_port() as well. </quote> Rather than relying on req being NULL (or ERR_PTR) for all cases where we don't want to trace or should not trace, simply check retval which is unconditionally initialized with -EIO != 0 and it can only become 0 on successful retval = zfcp_fsf_req_send(req). With that we can also remove the then again unnecessary unconditional initialization of req which was introduced with that earlier commit. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Suggested-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-08 14:34:22 +00:00
struct zfcp_fsf_req *req;
unsigned long req_id = 0;
int retval = -EIO;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_CLOSE_PORT,
SBAL_SFLAGS0_TYPE_READ,
qdio->adapter->pool.erp_req);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out;
}
req->status |= ZFCP_STATUS_FSFREQ_CLEANUP;
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
req->handler = zfcp_fsf_close_wka_port_handler;
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
req_id = req->req_id;
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
zfcp_fsf_req_free(req);
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send Dan Carpenter kindly reported: <quote> The patch d27a7cb91960: "zfcp: trace on request for open and close of WKA port" from Aug 10, 2016, leads to the following static checker warning: drivers/s390/scsi/zfcp_fsf.c:1615 zfcp_fsf_open_wka_port() warn: 'req' was already freed. drivers/s390/scsi/zfcp_fsf.c 1609 zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT); 1610 retval = zfcp_fsf_req_send(req); 1611 if (retval) 1612 zfcp_fsf_req_free(req); ^^^ Freed. 1613 out: 1614 spin_unlock_irq(&qdio->req_q_lock); 1615 if (req && !IS_ERR(req)) 1616 zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id); ^^^^^^^^^^^ Use after free. 1617 return retval; 1618 } Same thing for zfcp_fsf_close_wka_port() as well. </quote> Rather than relying on req being NULL (or ERR_PTR) for all cases where we don't want to trace or should not trace, simply check retval which is unconditionally initialized with -EIO != 0 and it can only become 0 on successful retval = zfcp_fsf_req_send(req). With that we can also remove the then again unnecessary unconditional initialization of req which was introduced with that earlier commit. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Suggested-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port") Cc: <stable@vger.kernel.org> #2.6.38+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-08 14:34:22 +00:00
if (!retval)
zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}
static void zfcp_fsf_close_physical_port_handler(struct zfcp_fsf_req *req)
{
struct zfcp_port *port = req->data;
struct fsf_qtcb_header *header = &req->qtcb->header;
struct scsi_device *sdev;
if (req->status & ZFCP_STATUS_FSFREQ_ERROR)
return;
switch (header->fsf_status) {
case FSF_PORT_HANDLE_NOT_VALID:
zfcp_erp_adapter_reopen(port->adapter, 0, "fscpph1");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_PORT_BOXED:
/* can't use generic zfcp_erp_modify_port_status because
* ZFCP_STATUS_COMMON_OPEN must not be reset for the port */
atomic_andnot(ZFCP_STATUS_PORT_PHYS_OPEN, &port->status);
shost_for_each_device(sdev, port->adapter->scsi_host)
if (sdev_to_zfcp(sdev)->port == port)
atomic_andnot(ZFCP_STATUS_COMMON_OPEN,
&sdev_to_zfcp(sdev)->status);
zfcp_erp_set_port_status(port, ZFCP_STATUS_COMMON_ACCESS_BOXED);
zfcp_erp_port_reopen(port, ZFCP_STATUS_COMMON_ERP_FAILED,
"fscpph2");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_ADAPTER_STATUS_AVAILABLE:
switch (header->fsf_status_qual.word[0]) {
case FSF_SQ_INVOKE_LINK_TEST_PROCEDURE:
/* fall through */
case FSF_SQ_ULP_DEPENDENT_ERP_REQUIRED:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
}
break;
case FSF_GOOD:
/* can't use generic zfcp_erp_modify_port_status because
* ZFCP_STATUS_COMMON_OPEN must not be reset for the port
*/
atomic_andnot(ZFCP_STATUS_PORT_PHYS_OPEN, &port->status);
shost_for_each_device(sdev, port->adapter->scsi_host)
if (sdev_to_zfcp(sdev)->port == port)
atomic_andnot(ZFCP_STATUS_COMMON_OPEN,
&sdev_to_zfcp(sdev)->status);
break;
}
}
/**
* zfcp_fsf_close_physical_port - close physical port
* @erp_action: pointer to struct zfcp_erp_action
* Returns: 0 on success
*/
int zfcp_fsf_close_physical_port(struct zfcp_erp_action *erp_action)
{
struct zfcp_qdio *qdio = erp_action->adapter->qdio;
struct zfcp_fsf_req *req;
int retval = -EIO;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_CLOSE_PHYSICAL_PORT,
SBAL_SFLAGS0_TYPE_READ,
qdio->adapter->pool.erp_req);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out;
}
req->status |= ZFCP_STATUS_FSFREQ_CLEANUP;
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
req->data = erp_action->port;
req->qtcb->header.port_handle = erp_action->port->handle;
req->erp_action = erp_action;
req->handler = zfcp_fsf_close_physical_port_handler;
erp_action->fsf_req_id = req->req_id;
zfcp_fsf_start_erp_timer(req);
retval = zfcp_fsf_req_send(req);
if (retval) {
zfcp_fsf_req_free(req);
erp_action->fsf_req_id = 0;
}
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return retval;
}
static void zfcp_fsf_open_lun_handler(struct zfcp_fsf_req *req)
{
struct zfcp_adapter *adapter = req->adapter;
struct scsi_device *sdev = req->data;
2012-09-04 13:23:36 +00:00
struct zfcp_scsi_dev *zfcp_sdev;
struct fsf_qtcb_header *header = &req->qtcb->header;
union fsf_status_qual *qual = &header->fsf_status_qual;
if (req->status & ZFCP_STATUS_FSFREQ_ERROR)
return;
2012-09-04 13:23:36 +00:00
zfcp_sdev = sdev_to_zfcp(sdev);
atomic_andnot(ZFCP_STATUS_COMMON_ACCESS_DENIED |
ZFCP_STATUS_COMMON_ACCESS_BOXED,
&zfcp_sdev->status);
switch (header->fsf_status) {
case FSF_PORT_HANDLE_NOT_VALID:
zfcp_erp_adapter_reopen(adapter, 0, "fsouh_1");
/* fall through */
case FSF_LUN_ALREADY_OPEN:
break;
case FSF_PORT_BOXED:
zfcp_erp_set_port_status(zfcp_sdev->port,
ZFCP_STATUS_COMMON_ACCESS_BOXED);
zfcp_erp_port_reopen(zfcp_sdev->port,
ZFCP_STATUS_COMMON_ERP_FAILED, "fsouh_2");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_LUN_SHARING_VIOLATION:
if (qual->word[0])
dev_warn(&zfcp_sdev->port->adapter->ccw_device->dev,
"LUN 0x%016Lx on port 0x%016Lx is already in "
"use by CSS%d, MIF Image ID %x\n",
zfcp_scsi_dev_lun(sdev),
(unsigned long long)zfcp_sdev->port->wwpn,
qual->fsf_queue_designator.cssid,
qual->fsf_queue_designator.hla);
zfcp_erp_set_lun_status(sdev,
ZFCP_STATUS_COMMON_ERP_FAILED |
ZFCP_STATUS_COMMON_ACCESS_DENIED);
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_MAXIMUM_NUMBER_OF_LUNS_EXCEEDED:
dev_warn(&adapter->ccw_device->dev,
"No handle is available for LUN "
"0x%016Lx on port 0x%016Lx\n",
(unsigned long long)zfcp_scsi_dev_lun(sdev),
(unsigned long long)zfcp_sdev->port->wwpn);
zfcp_erp_set_lun_status(sdev, ZFCP_STATUS_COMMON_ERP_FAILED);
/* fall through */
case FSF_INVALID_COMMAND_OPTION:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_ADAPTER_STATUS_AVAILABLE:
switch (header->fsf_status_qual.word[0]) {
case FSF_SQ_INVOKE_LINK_TEST_PROCEDURE:
zfcp_fc_test_link(zfcp_sdev->port);
/* fall through */
case FSF_SQ_ULP_DEPENDENT_ERP_REQUIRED:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
}
break;
case FSF_GOOD:
zfcp_sdev->lun_handle = header->lun_handle;
atomic_or(ZFCP_STATUS_COMMON_OPEN, &zfcp_sdev->status);
break;
}
}
/**
* zfcp_fsf_open_lun - open LUN
* @erp_action: pointer to struct zfcp_erp_action
* Returns: 0 on success, error otherwise
*/
int zfcp_fsf_open_lun(struct zfcp_erp_action *erp_action)
{
struct zfcp_adapter *adapter = erp_action->adapter;
struct zfcp_qdio *qdio = adapter->qdio;
struct zfcp_fsf_req *req;
int retval = -EIO;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_LUN,
SBAL_SFLAGS0_TYPE_READ,
adapter->pool.erp_req);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out;
}
req->status |= ZFCP_STATUS_FSFREQ_CLEANUP;
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
req->qtcb->header.port_handle = erp_action->port->handle;
req->qtcb->bottom.support.fcp_lun = zfcp_scsi_dev_lun(erp_action->sdev);
req->handler = zfcp_fsf_open_lun_handler;
req->data = erp_action->sdev;
req->erp_action = erp_action;
erp_action->fsf_req_id = req->req_id;
if (!(adapter->connection_features & FSF_FEATURE_NPIV_MODE))
req->qtcb->bottom.support.option = FSF_OPEN_LUN_SUPPRESS_BOXING;
zfcp_fsf_start_erp_timer(req);
retval = zfcp_fsf_req_send(req);
if (retval) {
zfcp_fsf_req_free(req);
erp_action->fsf_req_id = 0;
}
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return retval;
}
static void zfcp_fsf_close_lun_handler(struct zfcp_fsf_req *req)
{
struct scsi_device *sdev = req->data;
2012-09-04 13:23:36 +00:00
struct zfcp_scsi_dev *zfcp_sdev;
if (req->status & ZFCP_STATUS_FSFREQ_ERROR)
return;
2012-09-04 13:23:36 +00:00
zfcp_sdev = sdev_to_zfcp(sdev);
switch (req->qtcb->header.fsf_status) {
case FSF_PORT_HANDLE_NOT_VALID:
zfcp_erp_adapter_reopen(zfcp_sdev->port->adapter, 0, "fscuh_1");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_LUN_HANDLE_NOT_VALID:
zfcp_erp_port_reopen(zfcp_sdev->port, 0, "fscuh_2");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_PORT_BOXED:
zfcp_erp_set_port_status(zfcp_sdev->port,
ZFCP_STATUS_COMMON_ACCESS_BOXED);
zfcp_erp_port_reopen(zfcp_sdev->port,
ZFCP_STATUS_COMMON_ERP_FAILED, "fscuh_3");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_ADAPTER_STATUS_AVAILABLE:
switch (req->qtcb->header.fsf_status_qual.word[0]) {
case FSF_SQ_INVOKE_LINK_TEST_PROCEDURE:
zfcp_fc_test_link(zfcp_sdev->port);
/* fall through */
case FSF_SQ_ULP_DEPENDENT_ERP_REQUIRED:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
}
break;
case FSF_GOOD:
atomic_andnot(ZFCP_STATUS_COMMON_OPEN, &zfcp_sdev->status);
break;
}
}
/**
* zfcp_fsf_close_LUN - close LUN
* @erp_action: pointer to erp_action triggering the "close LUN"
* Returns: 0 on success, error otherwise
*/
int zfcp_fsf_close_lun(struct zfcp_erp_action *erp_action)
{
struct zfcp_qdio *qdio = erp_action->adapter->qdio;
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(erp_action->sdev);
struct zfcp_fsf_req *req;
int retval = -EIO;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_CLOSE_LUN,
SBAL_SFLAGS0_TYPE_READ,
qdio->adapter->pool.erp_req);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out;
}
req->status |= ZFCP_STATUS_FSFREQ_CLEANUP;
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
req->qtcb->header.port_handle = erp_action->port->handle;
req->qtcb->header.lun_handle = zfcp_sdev->lun_handle;
req->handler = zfcp_fsf_close_lun_handler;
req->data = erp_action->sdev;
req->erp_action = erp_action;
erp_action->fsf_req_id = req->req_id;
zfcp_fsf_start_erp_timer(req);
retval = zfcp_fsf_req_send(req);
if (retval) {
zfcp_fsf_req_free(req);
erp_action->fsf_req_id = 0;
}
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return retval;
}
static void zfcp_fsf_update_lat(struct zfcp_latency_record *lat_rec, u32 lat)
{
lat_rec->sum += lat;
lat_rec->min = min(lat_rec->min, lat);
lat_rec->max = max(lat_rec->max, lat);
}
static void zfcp_fsf_req_trace(struct zfcp_fsf_req *req, struct scsi_cmnd *scsi)
{
struct fsf_qual_latency_info *lat_in;
struct zfcp_latency_cont *lat = NULL;
2012-09-04 13:23:36 +00:00
struct zfcp_scsi_dev *zfcp_sdev;
struct zfcp_blk_drv_data blktrc;
int ticks = req->adapter->timer_ticks;
lat_in = &req->qtcb->prefix.prot_status_qual.latency_info;
blktrc.flags = 0;
blktrc.magic = ZFCP_BLK_DRV_DATA_MAGIC;
if (req->status & ZFCP_STATUS_FSFREQ_ERROR)
blktrc.flags |= ZFCP_BLK_REQ_ERROR;
blktrc.inb_usage = 0;
blktrc.outb_usage = req->qdio_req.qdio_outb_usage;
if (req->adapter->adapter_features & FSF_FEATURE_MEASUREMENT_DATA &&
!(req->status & ZFCP_STATUS_FSFREQ_ERROR)) {
2012-09-04 13:23:36 +00:00
zfcp_sdev = sdev_to_zfcp(scsi->device);
blktrc.flags |= ZFCP_BLK_LAT_VALID;
blktrc.channel_lat = lat_in->channel_lat * ticks;
blktrc.fabric_lat = lat_in->fabric_lat * ticks;
switch (req->qtcb->bottom.io.data_direction) {
case FSF_DATADIR_DIF_READ_STRIP:
case FSF_DATADIR_DIF_READ_CONVERT:
case FSF_DATADIR_READ:
lat = &zfcp_sdev->latencies.read;
break;
case FSF_DATADIR_DIF_WRITE_INSERT:
case FSF_DATADIR_DIF_WRITE_CONVERT:
case FSF_DATADIR_WRITE:
lat = &zfcp_sdev->latencies.write;
break;
case FSF_DATADIR_CMND:
lat = &zfcp_sdev->latencies.cmd;
break;
}
if (lat) {
spin_lock(&zfcp_sdev->latencies.lock);
zfcp_fsf_update_lat(&lat->channel, lat_in->channel_lat);
zfcp_fsf_update_lat(&lat->fabric, lat_in->fabric_lat);
lat->counter++;
spin_unlock(&zfcp_sdev->latencies.lock);
}
}
blk_add_driver_data(scsi->request->q, scsi->request, &blktrc,
sizeof(blktrc));
}
/**
* zfcp_fsf_fcp_handler_common() - FCP response handler common to I/O and TMF.
* @req: Pointer to FSF request.
* @sdev: Pointer to SCSI device as request context.
*/
static void zfcp_fsf_fcp_handler_common(struct zfcp_fsf_req *req,
struct scsi_device *sdev)
{
2012-09-04 13:23:36 +00:00
struct zfcp_scsi_dev *zfcp_sdev;
struct fsf_qtcb_header *header = &req->qtcb->header;
if (unlikely(req->status & ZFCP_STATUS_FSFREQ_ERROR))
return;
2012-09-04 13:23:36 +00:00
zfcp_sdev = sdev_to_zfcp(sdev);
switch (header->fsf_status) {
case FSF_HANDLE_MISMATCH:
case FSF_PORT_HANDLE_NOT_VALID:
zfcp_erp_adapter_reopen(req->adapter, 0, "fssfch1");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_FCPLUN_NOT_VALID:
case FSF_LUN_HANDLE_NOT_VALID:
zfcp_erp_port_reopen(zfcp_sdev->port, 0, "fssfch2");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_SERVICE_CLASS_NOT_SUPPORTED:
zfcp_fsf_class_not_supp(req);
break;
case FSF_DIRECTION_INDICATOR_NOT_VALID:
dev_err(&req->adapter->ccw_device->dev,
"Incorrect direction %d, LUN 0x%016Lx on port "
"0x%016Lx closed\n",
req->qtcb->bottom.io.data_direction,
(unsigned long long)zfcp_scsi_dev_lun(sdev),
(unsigned long long)zfcp_sdev->port->wwpn);
zfcp_erp_adapter_shutdown(req->adapter, 0, "fssfch3");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_CMND_LENGTH_NOT_VALID:
dev_err(&req->adapter->ccw_device->dev,
"Incorrect FCP_CMND length %d, FCP device closed\n",
req->qtcb->bottom.io.fcp_cmnd_length);
zfcp_erp_adapter_shutdown(req->adapter, 0, "fssfch4");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_PORT_BOXED:
zfcp_erp_set_port_status(zfcp_sdev->port,
ZFCP_STATUS_COMMON_ACCESS_BOXED);
zfcp_erp_port_reopen(zfcp_sdev->port,
ZFCP_STATUS_COMMON_ERP_FAILED, "fssfch5");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_LUN_BOXED:
zfcp_erp_set_lun_status(sdev, ZFCP_STATUS_COMMON_ACCESS_BOXED);
zfcp_erp_lun_reopen(sdev, ZFCP_STATUS_COMMON_ERP_FAILED,
"fssfch6");
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
case FSF_ADAPTER_STATUS_AVAILABLE:
if (header->fsf_status_qual.word[0] ==
FSF_SQ_INVOKE_LINK_TEST_PROCEDURE)
zfcp_fc_test_link(zfcp_sdev->port);
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
break;
}
}
static void zfcp_fsf_fcp_cmnd_handler(struct zfcp_fsf_req *req)
{
struct scsi_cmnd *scpnt;
struct fcp_resp_with_ext *fcp_rsp;
unsigned long flags;
read_lock_irqsave(&req->adapter->abort_lock, flags);
scpnt = req->data;
if (unlikely(!scpnt)) {
read_unlock_irqrestore(&req->adapter->abort_lock, flags);
return;
}
zfcp_fsf_fcp_handler_common(req, scpnt->device);
[SCSI] zfcp: Fix common FCP request reception The reception of a common FCP request should only be evaluated if the corresponding SCSI request data is available. Therefore put the information under the lock protection and verify the existence before processing. This fixes the following kernel panic. Unable to handle kernel pointer dereference at virtual kernel address 0000000180000000 Oops: 003b [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 0 Not tainted 2.6.35.7-45.x.20101007-s390xdefault #1 Process blast (pid: 9711, task: 00000000a3be8e40, ksp: 00000000b221bac0) Krnl PSW : 0704300180000000 0000000000489878 (zfcp_fsf_fcp_handler_common+0x4c/0x3a0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3 Krnl GPRS: 00000000b663c1b8 0000000180000000 000000007ab5bdf0 0000000000000000 00000000b0ccd800 0000000000000018 07000000a3be8e78 00000000b5d3e600 000000007ab5bdf0 0000000000000066 00000000b72137f0 00000000b72137f0 0000000000000000 00000000005a8178 00000000bdf37a60 00000000bdf379f0 Krnl Code: 0000000000489866: e3c030000004 lg %r12,0(%r3) 000000000048986c: e310c0000004 lg %r1,0(%r12) 0000000000489872: e31011e00004 lg %r1,480(%r1) >0000000000489878: 581011ec l %r1,492(%r1) 000000000048987c: a774001c brc 7,4898b4 0000000000489880: b91400b1 lgfr %r11,%r1 0000000000489884: 5810405c l %r1,92(%r4) 0000000000489888: 5510d00c cl %r1,12(%r13) Call Trace: ([<000000000010d344>] debug_event_common+0x22c/0x244) [<000000000048a0b4>] zfcp_fsf_fcp_cmnd_handler+0x2c/0x3b4 [<000000000048b5b6>] zfcp_fsf_req_complete+0x1b6/0x9dc [<000000000048bede>] zfcp_fsf_reqid_check+0x102/0x138 [<000000000048e478>] zfcp_qdio_int_resp+0x70/0x110 [<000000000044a1ec>] qdio_kick_handler+0xb0/0x19c [<000000000044c228>] __tiqdio_inbound_processing+0x30c/0xebc [<000000000014a5fc>] tasklet_action+0x1b4/0x1e8 [<000000000014b676>] __do_softirq+0x106/0x1cc [<000000000010d91a>] do_softirq+0xe6/0xec [<000000000014b0c8>] irq_exit+0xd4/0xd8 [<00000000004307ec>] do_IRQ+0x7c0/0xf54 [<0000000000114d28>] io_return+0x0/0x16 [<000000000055fef0>] sub_preempt_count+0x50/0xe4 ([<00000000b1f873c0>] 0xb1f873c0) [<000000000055e25a>] _raw_spin_unlock+0x46/0x74 [<0000000000241c40>] __d_lookup+0x288/0x2c8 [<000000000023502c>] do_lookup+0x7c/0x25c [<0000000000237fa8>] link_path_walk+0x5e4/0xe2c [<0000000000238a00>] path_walk+0x98/0x148 [<0000000000238c98>] do_path_lookup+0x74/0xc0 [<000000000023989c>] user_path_at+0x64/0xa4 [<000000000022e366>] vfs_fstatat+0x4e/0xb0 [<000000000022e4d6>] SyS_newstat+0x2e/0x54 [<00000000001146de>] sysc_noemu+0x10/0x16 [<0000020000153456>] 0x20000153456 INFO: lockdep is turned off. Last Breaking-Event-Address: [<000000000048a0ae>] zfcp_fsf_fcp_cmnd_handler+0x26/0x3b4 Signed-off-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-11-17 13:23:40 +00:00
if (unlikely(req->status & ZFCP_STATUS_FSFREQ_ERROR)) {
set_host_byte(scpnt, DID_TRANSPORT_DISRUPTED);
goto skip_fsfstatus;
}
switch (req->qtcb->header.fsf_status) {
case FSF_INCONSISTENT_PROT_DATA:
case FSF_INVALID_PROT_PARM:
set_host_byte(scpnt, DID_ERROR);
goto skip_fsfstatus;
case FSF_BLOCK_GUARD_CHECK_FAILURE:
zfcp_scsi_dif_sense_error(scpnt, 0x1);
goto skip_fsfstatus;
case FSF_APP_TAG_CHECK_FAILURE:
zfcp_scsi_dif_sense_error(scpnt, 0x2);
goto skip_fsfstatus;
case FSF_REF_TAG_CHECK_FAILURE:
zfcp_scsi_dif_sense_error(scpnt, 0x3);
goto skip_fsfstatus;
}
BUILD_BUG_ON(sizeof(struct fcp_resp_with_ext) > FSF_FCP_RSP_SIZE);
fcp_rsp = &req->qtcb->bottom.io.fcp_rsp.iu;
zfcp_fc_eval_fcp_rsp(fcp_rsp, scpnt);
skip_fsfstatus:
zfcp_fsf_req_trace(req, scpnt);
zfcp_dbf_scsi_result(scpnt, req);
scpnt->host_scribble = NULL;
(scpnt->scsi_done) (scpnt);
/*
* We must hold this lock until scsi_done has been called.
* Otherwise we may call scsi_done after abort regarding this
* command has completed.
* Note: scsi_done must not block!
*/
read_unlock_irqrestore(&req->adapter->abort_lock, flags);
}
static int zfcp_fsf_set_data_dir(struct scsi_cmnd *scsi_cmnd, u32 *data_dir)
{
switch (scsi_get_prot_op(scsi_cmnd)) {
case SCSI_PROT_NORMAL:
switch (scsi_cmnd->sc_data_direction) {
case DMA_NONE:
*data_dir = FSF_DATADIR_CMND;
break;
case DMA_FROM_DEVICE:
*data_dir = FSF_DATADIR_READ;
break;
case DMA_TO_DEVICE:
*data_dir = FSF_DATADIR_WRITE;
break;
case DMA_BIDIRECTIONAL:
return -EINVAL;
}
break;
case SCSI_PROT_READ_STRIP:
*data_dir = FSF_DATADIR_DIF_READ_STRIP;
break;
case SCSI_PROT_WRITE_INSERT:
*data_dir = FSF_DATADIR_DIF_WRITE_INSERT;
break;
case SCSI_PROT_READ_PASS:
*data_dir = FSF_DATADIR_DIF_READ_CONVERT;
break;
case SCSI_PROT_WRITE_PASS:
*data_dir = FSF_DATADIR_DIF_WRITE_CONVERT;
break;
default:
return -EINVAL;
}
return 0;
}
/**
* zfcp_fsf_fcp_cmnd - initiate an FCP command (for a SCSI command)
* @scsi_cmnd: scsi command to be sent
*/
int zfcp_fsf_fcp_cmnd(struct scsi_cmnd *scsi_cmnd)
{
struct zfcp_fsf_req *req;
struct fcp_cmnd *fcp_cmnd;
u8 sbtype = SBAL_SFLAGS0_TYPE_READ;
int retval = -EIO;
struct scsi_device *sdev = scsi_cmnd->device;
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
struct zfcp_qdio *qdio = adapter->qdio;
struct fsf_qtcb_bottom_io *io;
[SCSI] zfcp: Issue FCP command without holding SCSI host_lock Interrupting the connection to the FCP channel while I/O requests are being issued can lead to this deadlock. scsi_dispatch_cmd already holds the host_lock while the recovery trigger tries to acquire the host_lock again when iterating through the scsi_devices. INFO: lockdep is turned off. BUG: spinlock lockup on CPU#1, blast/9660, 0000000078f38878 CPU: 1 Not tainted 2.6.35.7SWEN2 #2 Process blast (pid: 9660, task: 0000000071f75940, ksp: 0000000074393ac0) 0000000074393640 00000000743935c0 0000000000000002 0000000000000000 0000000074393660 00000000743935d8 00000000743935d8 00000000005590c2 0000000000000000 0000000078f38878 0000000026ede800 0000000078f38878 000000000000000d 040000000000000c 0000000074393628 0000000000000000 0000000000000000 0000000000100b2a 00000000743935c0 0000000074393600 Call Trace: ([<0000000000100a32>] show_trace+0xee/0x144) [<00000000003be202>] do_raw_spin_lock+0x112/0x178 [<000000000055d408>] _raw_spin_lock_irqsave+0x90/0xb0 [<00000000003f1514>] __scsi_iterate_devices+0x38/0xbc [<00000000004849b0>] zfcp_erp_clear_adapter_status+0xd0/0x16c [<000000000048587a>] zfcp_erp_adapter_reopen+0x3a/0xb4 [<0000000000489812>] zfcp_fsf_req_send+0x166/0x180 [<000000000048c8d6>] zfcp_fsf_fcp_cmnd+0x272/0x408 [<000000000048f864>] zfcp_scsi_queuecommand+0x11c/0x1e0 [<00000000003f1f2a>] scsi_dispatch_cmd+0x1d6/0x324 [<00000000003f9910>] scsi_request_fn+0x42c/0x56c [<00000000003828ae>] __blk_run_queue+0x86/0x140 [<000000000037f742>] elv_insert+0x11a/0x208 [<000000000038104c>] blk_insert_cloned_request+0x84/0xe4 [<000003c0032b7c64>] dm_dispatch_request+0x6c/0x94 [dm_mod] [<000003c0032b7d5c>] map_request+0xd0/0x100 [dm_mod] [<000003c0032b9a78>] dm_request_fn+0xec/0x1bc [dm_mod] [<0000000000382c0e>] generic_unplug_device+0x5a/0x6c [<000003c0032b7f98>] dm_unplug_all+0x74/0x9c [dm_mod] [<00000000001d1272>] sync_page+0x76/0x9c [<00000000001d12ba>] sync_page_killable+0x22/0x60 [<000000000055a768>] __wait_on_bit_lock+0xc0/0x124 [<00000000001d1140>] __lock_page_killable+0x78/0x84 [<00000000001d351c>] generic_file_aio_read+0x5a4/0x7e8 [<0000000000228ec0>] do_sync_read+0xc8/0x12c [<0000000000229edc>] vfs_read+0xac/0x1ac [<000000000022a0d8>] SyS_read+0x58/0xa8 [<00000000001146de>] sysc_noemu+0x10/0x16 [<00000200000493c4>] 0x200000493c4 INFO: lockdep is turned off. Call zfcp_fsf_fcp_cmnd without the host_lock and disable the interrupts when acquiring the req_q_lock. According to the patch description in "[PATCH] Eliminate error handler overload of the SCSI serial number", the serial_number is not used, so simply drop the queuecommand wrapper function and run zfcp_scsi_queuecommand without holding the host_lock. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-11-18 13:53:18 +00:00
unsigned long flags;
if (unlikely(!(atomic_read(&zfcp_sdev->status) &
ZFCP_STATUS_COMMON_UNBLOCKED)))
return -EBUSY;
[SCSI] zfcp: Issue FCP command without holding SCSI host_lock Interrupting the connection to the FCP channel while I/O requests are being issued can lead to this deadlock. scsi_dispatch_cmd already holds the host_lock while the recovery trigger tries to acquire the host_lock again when iterating through the scsi_devices. INFO: lockdep is turned off. BUG: spinlock lockup on CPU#1, blast/9660, 0000000078f38878 CPU: 1 Not tainted 2.6.35.7SWEN2 #2 Process blast (pid: 9660, task: 0000000071f75940, ksp: 0000000074393ac0) 0000000074393640 00000000743935c0 0000000000000002 0000000000000000 0000000074393660 00000000743935d8 00000000743935d8 00000000005590c2 0000000000000000 0000000078f38878 0000000026ede800 0000000078f38878 000000000000000d 040000000000000c 0000000074393628 0000000000000000 0000000000000000 0000000000100b2a 00000000743935c0 0000000074393600 Call Trace: ([<0000000000100a32>] show_trace+0xee/0x144) [<00000000003be202>] do_raw_spin_lock+0x112/0x178 [<000000000055d408>] _raw_spin_lock_irqsave+0x90/0xb0 [<00000000003f1514>] __scsi_iterate_devices+0x38/0xbc [<00000000004849b0>] zfcp_erp_clear_adapter_status+0xd0/0x16c [<000000000048587a>] zfcp_erp_adapter_reopen+0x3a/0xb4 [<0000000000489812>] zfcp_fsf_req_send+0x166/0x180 [<000000000048c8d6>] zfcp_fsf_fcp_cmnd+0x272/0x408 [<000000000048f864>] zfcp_scsi_queuecommand+0x11c/0x1e0 [<00000000003f1f2a>] scsi_dispatch_cmd+0x1d6/0x324 [<00000000003f9910>] scsi_request_fn+0x42c/0x56c [<00000000003828ae>] __blk_run_queue+0x86/0x140 [<000000000037f742>] elv_insert+0x11a/0x208 [<000000000038104c>] blk_insert_cloned_request+0x84/0xe4 [<000003c0032b7c64>] dm_dispatch_request+0x6c/0x94 [dm_mod] [<000003c0032b7d5c>] map_request+0xd0/0x100 [dm_mod] [<000003c0032b9a78>] dm_request_fn+0xec/0x1bc [dm_mod] [<0000000000382c0e>] generic_unplug_device+0x5a/0x6c [<000003c0032b7f98>] dm_unplug_all+0x74/0x9c [dm_mod] [<00000000001d1272>] sync_page+0x76/0x9c [<00000000001d12ba>] sync_page_killable+0x22/0x60 [<000000000055a768>] __wait_on_bit_lock+0xc0/0x124 [<00000000001d1140>] __lock_page_killable+0x78/0x84 [<00000000001d351c>] generic_file_aio_read+0x5a4/0x7e8 [<0000000000228ec0>] do_sync_read+0xc8/0x12c [<0000000000229edc>] vfs_read+0xac/0x1ac [<000000000022a0d8>] SyS_read+0x58/0xa8 [<00000000001146de>] sysc_noemu+0x10/0x16 [<00000200000493c4>] 0x200000493c4 INFO: lockdep is turned off. Call zfcp_fsf_fcp_cmnd without the host_lock and disable the interrupts when acquiring the req_q_lock. According to the patch description in "[PATCH] Eliminate error handler overload of the SCSI serial number", the serial_number is not used, so simply drop the queuecommand wrapper function and run zfcp_scsi_queuecommand without holding the host_lock. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-11-18 13:53:18 +00:00
spin_lock_irqsave(&qdio->req_q_lock, flags);
if (atomic_read(&qdio->req_q_free) <= 0) {
atomic_inc(&qdio->req_q_full);
goto out;
}
if (scsi_cmnd->sc_data_direction == DMA_TO_DEVICE)
sbtype = SBAL_SFLAGS0_TYPE_WRITE;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_FCP_CMND,
sbtype, adapter->pool.scsi_req);
if (IS_ERR(req)) {
retval = PTR_ERR(req);
goto out;
}
scsi_cmnd->host_scribble = (unsigned char *) req->req_id;
io = &req->qtcb->bottom.io;
req->status |= ZFCP_STATUS_FSFREQ_CLEANUP;
req->data = scsi_cmnd;
req->handler = zfcp_fsf_fcp_cmnd_handler;
req->qtcb->header.lun_handle = zfcp_sdev->lun_handle;
req->qtcb->header.port_handle = zfcp_sdev->port->handle;
io->service_class = FSF_CLASS_3;
io->fcp_cmnd_length = FCP_CMND_LEN;
if (scsi_get_prot_op(scsi_cmnd) != SCSI_PROT_NORMAL) {
io->data_block_length = scsi_cmnd->device->sector_size;
io->ref_tag_value = scsi_get_lba(scsi_cmnd) & 0xFFFFFFFF;
}
if (zfcp_fsf_set_data_dir(scsi_cmnd, &io->data_direction))
goto failed_scsi_cmnd;
BUILD_BUG_ON(sizeof(struct fcp_cmnd) > FSF_FCP_CMND_SIZE);
fcp_cmnd = &req->qtcb->bottom.io.fcp_cmnd.iu;
zfcp_fc_scsi_to_fcp(fcp_cmnd, scsi_cmnd);
scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled Since commit db007fc5e20c ("[SCSI] Command protection operation"), scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it to SCSI_PROT_NORMAL. Other FCP LLDDs such as qla2xxx and lpfc shield their queuecommand() to only access any of scsi_prot_sg...() if (scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL). Do the same thing for zfcp, which introduced DIX support with commit ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for DIF/DIX"). Otherwise, TUR SCSI commands as part of scsi_eh likely fail in zfcp, because the regular SCSI command with DIX protection data, that scsi_eh re-uses in scsi_send_eh_cmnd(), of course still has (scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests to the FCP channel hardware. This causes scsi_eh_test_devices() to have (finish_cmds == 0) [not SCSI device is online or not scsi_eh_tur() failed] so regular SCSI commands, that caused / were affected by scsi_eh, are moved to work_q and scsi_eh_test_devices() itself returns false. In turn, it unnecessarily escalates in our case in scsi_eh_ready_devs() beyond host reset to finally scsi_eh_offline_sdevs() which sets affected SCSI devices offline with the following kernel message: "kernel: sd H:0:T:L: Device offlined - not ready after error recovery" Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for DIF/DIX") Cc: <stable@vger.kernel.org> #2.6.36+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-07-28 10:30:51 +00:00
if ((scsi_get_prot_op(scsi_cmnd) != SCSI_PROT_NORMAL) &&
scsi_prot_sg_count(scsi_cmnd)) {
zfcp_qdio_set_data_div(qdio, &req->qdio_req,
scsi_prot_sg_count(scsi_cmnd));
retval = zfcp_qdio_sbals_from_sg(qdio, &req->qdio_req,
scsi_prot_sglist(scsi_cmnd));
if (retval)
goto failed_scsi_cmnd;
io->prot_data_length = zfcp_qdio_real_bytes(
scsi_prot_sglist(scsi_cmnd));
}
retval = zfcp_qdio_sbals_from_sg(qdio, &req->qdio_req,
scsi_sglist(scsi_cmnd));
if (unlikely(retval))
goto failed_scsi_cmnd;
zfcp_qdio_set_sbale_last(adapter->qdio, &req->qdio_req);
if (zfcp_adapter_multi_buffer_active(adapter))
zfcp_qdio_set_scount(qdio, &req->qdio_req);
retval = zfcp_fsf_req_send(req);
if (unlikely(retval))
goto failed_scsi_cmnd;
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
/* NOTE: DO NOT TOUCH req PAST THIS POINT! */
goto out;
failed_scsi_cmnd:
zfcp_fsf_req_free(req);
scsi_cmnd->host_scribble = NULL;
out:
[SCSI] zfcp: Issue FCP command without holding SCSI host_lock Interrupting the connection to the FCP channel while I/O requests are being issued can lead to this deadlock. scsi_dispatch_cmd already holds the host_lock while the recovery trigger tries to acquire the host_lock again when iterating through the scsi_devices. INFO: lockdep is turned off. BUG: spinlock lockup on CPU#1, blast/9660, 0000000078f38878 CPU: 1 Not tainted 2.6.35.7SWEN2 #2 Process blast (pid: 9660, task: 0000000071f75940, ksp: 0000000074393ac0) 0000000074393640 00000000743935c0 0000000000000002 0000000000000000 0000000074393660 00000000743935d8 00000000743935d8 00000000005590c2 0000000000000000 0000000078f38878 0000000026ede800 0000000078f38878 000000000000000d 040000000000000c 0000000074393628 0000000000000000 0000000000000000 0000000000100b2a 00000000743935c0 0000000074393600 Call Trace: ([<0000000000100a32>] show_trace+0xee/0x144) [<00000000003be202>] do_raw_spin_lock+0x112/0x178 [<000000000055d408>] _raw_spin_lock_irqsave+0x90/0xb0 [<00000000003f1514>] __scsi_iterate_devices+0x38/0xbc [<00000000004849b0>] zfcp_erp_clear_adapter_status+0xd0/0x16c [<000000000048587a>] zfcp_erp_adapter_reopen+0x3a/0xb4 [<0000000000489812>] zfcp_fsf_req_send+0x166/0x180 [<000000000048c8d6>] zfcp_fsf_fcp_cmnd+0x272/0x408 [<000000000048f864>] zfcp_scsi_queuecommand+0x11c/0x1e0 [<00000000003f1f2a>] scsi_dispatch_cmd+0x1d6/0x324 [<00000000003f9910>] scsi_request_fn+0x42c/0x56c [<00000000003828ae>] __blk_run_queue+0x86/0x140 [<000000000037f742>] elv_insert+0x11a/0x208 [<000000000038104c>] blk_insert_cloned_request+0x84/0xe4 [<000003c0032b7c64>] dm_dispatch_request+0x6c/0x94 [dm_mod] [<000003c0032b7d5c>] map_request+0xd0/0x100 [dm_mod] [<000003c0032b9a78>] dm_request_fn+0xec/0x1bc [dm_mod] [<0000000000382c0e>] generic_unplug_device+0x5a/0x6c [<000003c0032b7f98>] dm_unplug_all+0x74/0x9c [dm_mod] [<00000000001d1272>] sync_page+0x76/0x9c [<00000000001d12ba>] sync_page_killable+0x22/0x60 [<000000000055a768>] __wait_on_bit_lock+0xc0/0x124 [<00000000001d1140>] __lock_page_killable+0x78/0x84 [<00000000001d351c>] generic_file_aio_read+0x5a4/0x7e8 [<0000000000228ec0>] do_sync_read+0xc8/0x12c [<0000000000229edc>] vfs_read+0xac/0x1ac [<000000000022a0d8>] SyS_read+0x58/0xa8 [<00000000001146de>] sysc_noemu+0x10/0x16 [<00000200000493c4>] 0x200000493c4 INFO: lockdep is turned off. Call zfcp_fsf_fcp_cmnd without the host_lock and disable the interrupts when acquiring the req_q_lock. According to the patch description in "[PATCH] Eliminate error handler overload of the SCSI serial number", the serial_number is not used, so simply drop the queuecommand wrapper function and run zfcp_scsi_queuecommand without holding the host_lock. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-11-18 13:53:18 +00:00
spin_unlock_irqrestore(&qdio->req_q_lock, flags);
return retval;
}
static void zfcp_fsf_fcp_task_mgmt_handler(struct zfcp_fsf_req *req)
{
struct scsi_device *sdev = req->data;
struct fcp_resp_with_ext *fcp_rsp;
struct fcp_resp_rsp_info *rsp_info;
zfcp_fsf_fcp_handler_common(req, sdev);
fcp_rsp = &req->qtcb->bottom.io.fcp_rsp.iu;
rsp_info = (struct fcp_resp_rsp_info *) &fcp_rsp[1];
if ((rsp_info->rsp_code != FCP_TMF_CMPL) ||
(req->status & ZFCP_STATUS_FSFREQ_ERROR))
req->status |= ZFCP_STATUS_FSFREQ_TMFUNCFAILED;
}
/**
* zfcp_fsf_fcp_task_mgmt() - Send SCSI task management command (TMF).
* @sdev: Pointer to SCSI device to send the task management command to.
* @tm_flags: Unsigned byte for task management flags.
*
* Return: On success pointer to struct zfcp_fsf_req, %NULL otherwise.
*/
struct zfcp_fsf_req *zfcp_fsf_fcp_task_mgmt(struct scsi_device *sdev,
u8 tm_flags)
{
struct zfcp_fsf_req *req = NULL;
struct fcp_cmnd *fcp_cmnd;
struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
struct zfcp_qdio *qdio = zfcp_sdev->port->adapter->qdio;
if (unlikely(!(atomic_read(&zfcp_sdev->status) &
ZFCP_STATUS_COMMON_UNBLOCKED)))
return NULL;
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_lock_irq(&qdio->req_q_lock);
if (zfcp_qdio_sbal_get(qdio))
goto out;
req = zfcp_fsf_req_create(qdio, FSF_QTCB_FCP_CMND,
SBAL_SFLAGS0_TYPE_WRITE,
qdio->adapter->pool.scsi_req);
if (IS_ERR(req)) {
req = NULL;
goto out;
}
req->data = sdev;
req->handler = zfcp_fsf_fcp_task_mgmt_handler;
req->qtcb->header.lun_handle = zfcp_sdev->lun_handle;
req->qtcb->header.port_handle = zfcp_sdev->port->handle;
req->qtcb->bottom.io.data_direction = FSF_DATADIR_CMND;
req->qtcb->bottom.io.service_class = FSF_CLASS_3;
req->qtcb->bottom.io.fcp_cmnd_length = FCP_CMND_LEN;
zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
fcp_cmnd = &req->qtcb->bottom.io.fcp_cmnd.iu;
zfcp_fc_fcp_tm(fcp_cmnd, sdev, tm_flags);
zfcp_fsf_start_timer(req, ZFCP_FSF_SCSI_ER_TIMEOUT);
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
if (!zfcp_fsf_req_send(req)) {
/* NOTE: DO NOT TOUCH req, UNTIL IT COMPLETES! */
goto out;
scsi: zfcp: fix request object use-after-free in send path causing seqno errors With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> #5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-07-02 21:02:00 +00:00
}
zfcp_fsf_req_free(req);
req = NULL;
out:
[SCSI] zfcp: Change spin_lock_bh to spin_lock_irq to fix lockdep warning With the change to use the data on the SCSI device, iterating through all LUNs/scsi_devices takes the SCSI host_lock. This triggers warnings from the lock dependency checker: ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #97 --------------------------------------------------------- chchp/3224 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->req_q_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: [ 24.972394] 2 locks held by chchp/3224: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-....}, at: [<0000000000490302>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.34.1 #98 --------------------------------------------------------- chchp/3235 just changed the state of lock: (&(shost->host_lock)->rlock){-.-...}, at: [<00000000003a73f4>] __scsi_iterate_devices+0x38/0xbc but this lock took another, HARDIRQ-unsafe lock in the past: (&(&qdio->stat_lock)->rlock){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 2 locks held by chchp/3235: #0: (&(sch->lock)->rlock){-.-...}, at: [<0000000000401efa>] do_IRQ+0xb2/0x1e4 #1: (&adapter->port_list_lock){.-.-..}, at: [<00000000004902f6>] zfcp_erp_modify_adapter_status+0x9e/0x16c [...] To stop this warning, change the request queue lock to disable irqs, not only softirq. The changes are required only outside of the critical "send fcp command" path. Reviewed-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-08 12:39:57 +00:00
spin_unlock_irq(&qdio->req_q_lock);
return req;
}
/**
* zfcp_fsf_reqid_check - validate req_id contained in SBAL returned by QDIO
* @qdio: pointer to struct zfcp_qdio
* @sbal_idx: response queue index of SBAL to be processed
*/
void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx)
{
struct zfcp_adapter *adapter = qdio->adapter;
struct qdio_buffer *sbal = qdio->res_q[sbal_idx];
struct qdio_buffer_element *sbale;
struct zfcp_fsf_req *fsf_req;
unsigned long req_id;
int idx;
for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) {
sbale = &sbal->element[idx];
req_id = (unsigned long) sbale->addr;
fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id);
if (!fsf_req) {
/*
* Unknown request means that we have potentially memory
* corruption and must stop the machine immediately.
*/
zfcp_qdio_siosl(adapter);
panic("error: unknown req_id (%lx) on adapter %s.\n",
req_id, dev_name(&adapter->ccw_device->dev));
}
zfcp_fsf_req_complete(fsf_req);
if (likely(sbale->eflags & SBAL_EFLAGS_LAST_ENTRY))
break;
}
}