linux

Author	SHA1	Message	Date
Robert Love	4469c195da	[SCSI] fcoe: Change fcoe receive thread nice value from 19 (lowest priority) to -20 This change makes the fcoe Rx threads have the same nice value as lpfc and qla2xxx Rx threads. Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-10 09:10:02 -05:00
Chris Leech	55c8bafba5	[SCSI] fcoe: fix handling of pending queue, prevent out of order frames (v3) In fcoe_check_wait_queue() the queue length could temporarily drop to 0, before the last frame was successfully sent. This resulted in out of order data frames within a single sequence, leading to IO timeout errors. This builds on the approach from Vasu Dev to only fix the queue management in fcoe_check_wait_queue, where my first patch added locking to the transmit path even when the pending queue was not in use. This patch continues to use fcoe_pending_queue.qlen instead of introducing a new length counter, but takes precautions to ensure it never drops to 0 before the final frame in the queue has successfully been passed to the netdev qdisc layer. It also includes some cleanup of fcoe_check_wait_queue and removes the fcoe_insert_wait_queue(_head) wrapper functions. Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-10 09:09:40 -05:00
Vasu Dev	c826a31457	[SCSI] fcoe: Out of order tx frames was causing several check condition SCSI status frames followed by these errors in log. [sdp] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK [sdp] Sense Key : Aborted Command [current] [sdp] Add. Sense: Data phase error This was causing some test apps to exit due to write failure under heavy load. This was due to a race around adding and removing tx frame skb in fcoe_pending_queue, Chris Leech helped me to find that brief unlocking period when pulling skb from fcoe_pending_queue in various contexts (fcoe_watchdog and fcoe_xmit) and then adding skb back into fcoe_pending_queue up on a failed fcoe_start_io could change skb/tx frame order in fcoe_pending_queue. Thanks Chris. This patch allows only single context to pull skb from fcoe_pending_queue at any time to prevent above described ordering issue/race by use of fcoe_pending_queue_active flag. This patch simplified fcoe_watchdog with modified fcoe_check_wait_queue by use of FCOE_LOW_QUEUE_DEPTH instead previously used several conditionals to clear and set lp->qfull. I think FCOE_MAX_QUEUE_DEPTH with FCOE_LOW_QUEUE_DEPTH will work better in re/setting lp->qfull and these could be fine tuned for performance. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-10 09:09:21 -05:00
Roel Kluin	e904158159	[SCSI] fcoe: fix kfree(skb) Use kfree_skb instead of kfree for struct sk_buff pointers. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-10 09:09:01 -05:00
Yi Zou	422819cfa3	[SCSI] libfc: do not change the fh_rx_id of a recevied frame We shouldn't be altering inbound frames. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-10 09:06:36 -05:00
Robert Love	03ec862dff	[SCSI] fcoe: Correct fcoe_transports initialization vs. registration The registration function shouldn't initialize the mutex or list head. The fcoe SW transport should initialize itself before registering. Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-10 09:06:17 -05:00
Robert Love	a468f328ad	[SCSI] fcoe: Use setup_timer() and mod_timer() Use helper functions for watchdog timer setup. Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-10 09:05:57 -05:00
Robert Love	fc47ff6b1b	[SCSI] libfc, fcoe: Remove unnecessary cast by removing inline wrapper Comment from "Andrew Morton <akpm@linux-foundation.org>" > +{ > + return (struct fcoe_softc )lport_priv(lp); unneeded/undesirable cast of void. There are probably zillions of instances of this - there always are. This whole inline function was unnecessary. The FCoE layer knows that it's data structure is stored in the lport private data, it can just access it from lport_priv(). Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-10 09:05:35 -05:00
Robert Love	b2ab99c9a3	[SCSI] libfc, fcoe: Cleanup function formatting and minor typos 1) There were a few functions with a strange layout, i.e. all arguments on the second line, when not necessary. Where ever possible I moved the return value to the same line as the function name. However, when the line was too long to have a single argument on the same line I moved the return value to above line. For example: <short return> <function name>(<arg 1>, <arg2>) and <very long return value> <function name>(<arg1>, <arg2>) 2) Removed one extra whitespace line 3) Fixed two typos Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-10 09:05:09 -05:00
Robert Love	34f42a070f	[SCSI] libfc, fcoe: Fix kerneldoc comments 1) Added '()' for function names in kerneldoc comments 2) Changed comment bookends from '*/' to '/'. The comment on the the mailing list was that '/' "is consistently unconventional. Not wrong, just odd." The Documentation/kernel-doc-nano-HOWTO.txt states that kerneldoc comment blocks should end with '/' but most (if not all) instance I found under drivers/scsi/ were only using the '*/' so I converted to that style. 3) Removed incorrect linebreaks in kerneldoc comments where found 4) Removed a few unnecessary blank comment lines in kerneldoc comment blocks Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-10 09:04:40 -05:00
Robert Love	efaf5c085d	[SCSI] libfc: check for err when recv and state is incorrect If we've just created an interface and the an rport is logging in we may have a request on the wire (say PRLI). If we destroy the interface, we'll go through each rport on the disc->rports list and set each rport's state to NONE. Then the lport will reset the EM. The EM reset will send a CLOSED event to the prli_resp() handler which will notice that the state != PRLI. In this case it frees the frame pointer, decrements the refcount and unlocks the rport. The problem is that there isn't a frame in this case. It's just a pointer with an embedded error code. The free causes an Oops. This patch moves the error checking to be before the state checking. Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-06 15:44:36 -06:00
Robert Love	d3b33327ca	[SCSI] libfc: rename rp to rdata in fc_disc_new_target() Just rename the variable as per our naming convention. Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-06 15:41:37 -06:00
Robert Love	23f11f9076	[SCSI] libfc: correct RPORT_TO_PRIV usage We only need to use this macro when assigning a value to rport->dd_data. All other accesses should just use dd_data. Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-06 15:41:16 -06:00
Robert Love	5101ff99f5	[SCSI] libfc: Don't violate transport template for rogue port creation Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-06 15:41:01 -06:00
Steve Ma	f7db2c150c	[SCSI] libfc: exch mgr is freed while lport still retrying sequences When a sequence cannot be delivered to the target, the local port will schedule retries, While this process is in progress, if we destroy the FCoE interface, the fcoe_sw_destroy routine is entered, and the fc_exch_mgr_free(lp->emp) is called. Thus if fc_exch_alloc() is called when retrying the sequence, the mempool_alloc() will fail to allocate the exchange because the mempool of the exchange manager has already been released. This patch is to cancel any pending retry work of the local port before we start to destroy the interface. Also, when resetting the local port, we should also stop the scheduled pending retries. Signed-off-by: Steve Ma <steve.ma@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-06 15:40:45 -06:00
Vasu Dev	26d9cab558	[SCSI] libfc: fixed a read IO data integrity issue when a IO data frame lost The fc_fcp_complete_locked detected data underrun in this case and set the FC_DATA_UNDRUN but that was ignored by fc_io_compl for all cases including read underrun. Added code to not to ignore FC_DATA_UNDRUN for read IO and instead suggested scsi-ml to retry cmd to recover from lost data frame. Not sure if it is okay to ignore FC_DATA_UNDRUN for other case, so let code as is for other cases but removed or-ing with zero valued fsp->cdb_status for those cases. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-06 15:40:06 -06:00
Chris Leech	6755db1cd4	[SCSI] libfc: rport retry on LS_RJT from certain ELS This allows any rport ELS to retry on LS_RJT. The rport error handling would only retry on resource allocation failures and exchange timeouts. I have a target that will occasionally reject PLOGI when we do a quick LOGO/PLOGI. When a critical ELS was rejected, libfc would fail silently leaving the rport in a dead state. The retry count and delay are managed by fc_rport_error_retry. If the retry count is exceeded fc_rport_error will be called. When retrying is not the correct course of action, fc_rport_error can be called directly. Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-06 15:39:34 -06:00
Vasu Dev	bc0e17f691	[SCSI] libfc, fcoe: fixed locking issues with lport->lp_mutex around lport->link_status The fcoe_xmit could call fc_pause in case the pending skb queue len is larger than FCOE_MAX_QUEUE_DEPTH, the fc_pause was trying to grab lport->lp_muex to change lport->link_status and that had these issues :- 1. The fcoe_xmit was getting called with bh disabled, thus causing "BUG: scheduling while atomic" when grabbing lport->lp_muex with bh disabled. 2. fc_linkup and fc_linkdown function calls lport_enter function with lport->lp_mutex held and these enter function in turn calls fcoe_xmit to send lport related FC frame, e.g. fc_linkup => fc_lport_enter_flogi to send flogi req. In this case grabbing the same lport->lp_mutex again in fc_puase from fcoe_xmit would cause deadlock. The lport->lp_mutex was used for setting FC_PAUSE in fcoe_xmit path but FC_PAUSE bit was not used anywhere beside just setting and clear this bit in lport->link_status, instead used a separate field qfull in fc_lport to eliminate need for lport->lp_mutex to track pending queue full condition and in turn avoid above described two locking issues. Also added check for lp->qfull in fc_fcp_lport_queue_ready to trigger SCSI_MLQUEUE_HOST_BUSY when lp->qfull is set to prevent more scsi-ml cmds while lp->qfull is set. This patch eliminated FC_LINK_UP and FC_PAUSE and instead used dedicated fields in fc_lport for this, this simplified all related conditional code. Also removed fc_pause and fc_unpause functions and instead used newly added lport->qfull directly in fcoe. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-06 15:37:49 -06:00
Vasu Dev	a7e84f2b83	[SCSI] libfc: fixed a soft lockup issue in fc_exch_recv_abts The fc_seq_start_next grabs ep->ex_lock but this lock was already held here, so instead called fc_seq_start_next_locked to avoid soft lockup. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-06 15:37:23 -06:00
Vasu Dev	78342da368	[SCSI] libfc: handle RRQ exch timeout Cleanup exchange held due to RRQ when RRQ exch times out, in this case the ABTS is already done causing RRQ req therefore proceeding with cleanup in fc_exch_rrq_resp should be okay to restore exch resource. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-06 15:36:56 -06:00
Abhijeet Joglekar	571f824c3c	[SCSI] libfc: when rport goes away (re-plogi), clean up exchanges to/from rport When a rport goes away, libFC does a plogi which will reset exchanges at the rport. Clean exchanges at our end, both in transport and libFC. If transport hooks into exch_mgr_reset, it will call back into fc_exch_mgr_reset() to clean up libFC exchanges. Signed-off-by: Abhijeet Joglekar <abjoglek@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-06 15:36:28 -06:00
Abhijeet Joglekar	1f6ff364ce	[SCSI] libfc: Pass lport in exch_mgr_reset fc_exch_mgr structure is private to fc_exch.c. To export exch_mgr_reset to transport, transport needs access to the exch manager. Change exch_mgr_reset to use lport param which is the shared structure between libFC and transport. Alternatively, fc_exch_mgr definition can be moved to libfc.h so that lport can be accessed from mp*. Signed-off-by: Abhijeet Joglekar <abjoglek@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-06 15:35:47 -06:00
Matthew Wilcox	33dd6f92a1	[SCSI] sd: Don't try to spin up drives that are connected to an inactive port We currently try to spin up drives connected to standby and unavailable ports. This will never succeed and wastes a lot of time. Fail quickly if the sense data reports the port is in standby or unavailable state. Reported-by: Narayanan Rengarajan <narayanan.rengarajan@hp.com> Tested-by: Narayanan Rengarajan <narayanan.rengarajan@hp.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-05 10:34:37 -06:00
James Bottomley	126c098296	[SCSI] fix ABORTED_COMMAND looping forever problem Instead of terminating after five retries, commands terminated by ABORTED_COMMAND sense are retrying forever. The problem was introduced by: commit `b60af5b0ad` Author: Alan Stern <stern@rowland.harvard.edu> Date: Mon Nov 3 15:56:47 2008 -0500 [SCSI] simplify scsi_io_completion() Which introduced an error whereby ABORTED_COMMAND now gets erroneously retried in scsi_io_completion. Fix this by returning the behaviour back to the default no retry. Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com> Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-21 20:29:38 -06:00
Tejun Heo	4034cc6815	[SCSI] sd: revive sd_index_lock Commit `f27bac2761` which converted sd to use ida instead of idr incorrectly removed sd_index_lock around id allocation and free. idr/ida do have internal locks but they protect their free object lists not the allocation itself. The caller is responsible for that. This missing synchronization led to the same id being assigned to multiple devices leading to oops. Reported and tracked down by Stuart Hayes of Dell. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Stable Tree <stable@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-21 20:29:38 -06:00
Karen Xie	b7e7bd3446	[SCSI] cxgb3i: update the driver version to 1.0.1 Signed-off-by: Karen Xie <kxie@chelsio.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-21 20:29:38 -06:00
Karen Xie	992040f540	[SCSI] cxgb3i: added missing include in cxgb3i_ddp.h Signed-off-by: Karen Xie <kxie@chelsio.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-21 20:29:37 -06:00
Karen Xie	f62d0896e6	[SCSI] cxgb3i: Outgoing pdus need to observe skb's MAX_SKB_FRAGS Need to make sure the outgoing pdu can fit into a single skb. When calulating the max. outgoing pdu payload size, take into consideration of - data can be held in the skb's fragment list, assume 512 bytes per fragment, and - data can be held in the headroom. Signed-off-by: Karen Xie <kxie@chelsio.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-21 20:29:37 -06:00
Karen Xie	949847d195	[SCSI] cxgb3i: added per-task data to track transmit progress added per-task struct cxgb3i_task_data to track the data transmiting progress and the state of the pdus to be transmitted. Signed-off-by: Karen Xie <kxie@chelsio.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-21 20:29:37 -06:00
Karen Xie	1648b11ea7	[SCSI] cxgb3i: transmit work-request fixes - resize the work-request credit array to be based on skb's MAX_SKB_FRAGS. - split the skb cb into tx and rx portion - increase the default transmit window to 128K. - stop queueing up the outgoing pdus if transmit window is full. Signed-off-by: Karen Xie <kxie@chelsio.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-21 20:29:36 -06:00
HighPoint Linux Team	b73a774942	[SCSI] hptiop: Add new PCI device ID Signed-off-by: HighPoint Linux Team <linux@highpoint-tech.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-21 20:29:36 -06:00
Andrew Vasquez	822c05b633	[SCSI] qla2xxx: Update version number to 8.03.00-k3. Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:20 -05:00
Andrew Vasquez	9088608e00	[SCSI] qla2xxx: Mask out 'reserved' bits while processing FLT regions. Bits 31-8 are marked as reserved and should be ignored while interpreting a region's code. Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:20 -05:00
Anirban Chakraborty	cf5a163127	[SCSI] qla2xxx: Correct slab-error overwrite during vport creation and deletion. The clearing of a vha's req_ques were overrunning during vport creation. During deletion, vport queues should be torn-down after all cleanup has occurred. Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:20 -05:00
Andrew Vasquez	8a659571ec	[SCSI] qla2xxx: Properly acknowledge IDC notification messages. To ensure smooth operations amongst the FCoE and NIC side components of the ISP81xx chip, the FCoE driver (qla2xxx) must ensure the 10gb NIC driver (qlge) does not timeout waiting for IDC (Inter-Driver Communication) acknowledgments. The acknowledgment requirements are trivial -- a simple mirroring of incoming mailbox registers during the AEN to a process-context capable mailbox command. Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:20 -05:00
Anirban Chakraborty	618a752319	[SCSI] qla2xxx: Remove interrupt request bit check in the response processing path in multiq mode. Correct response-queue-0 processing by instructing the firmware to run with interrupt-handshaking disabled, similarly to what is now done for all non-0 response queues. Since all response-queues now run in the same mode, the driver no longer needs the hot-path 'is-disabled-HCCR' test. Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:19 -05:00
Julia Lawall	e916141c68	[SCSI] lpfc: introduce missing kfree Error handling code following a kmalloc should free the allocated data. The semantic match that finds the problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r exists@ local idexpression x; statement S; expression E; identifier f,l; position p1,p2; expression ptr != NULL; @@ ( if ((x@p1 = \(kmalloc\\|kzalloc\\|kcalloc\)(...)) == NULL) S \| x@p1 = \(kmalloc\\|kzalloc\\|kcalloc\)(...); ... if (x == NULL) S ) <... when != x when != if (...) { <+...x...+> } x->f = E ...> ( return \(0\\|<+...x...+>\\|ptr\); \| return@p2 ...; ) @script:python@ p1 << r.p1; p2 << r.p2; @@ print " file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:19 -05:00
Mike Christie	308cec14e6	[SCSI] libiscsi: Fix scsi command timeout oops in iscsi_eh_timed_out Yanling Qi from LSI found the root cause of the panic, below is his analysis: Problem description: the open iscsi driver installs eh_timed_out handler to the blank_transport_template of the scsi middle level that causes panic of timed out command of other host Here are the details Iscsi Session creation During iscsi session creation time, the iscsi_tcp_session_create() of iscsi_tpc.c will create a scsi-host for the session. See the statement marked with the label A. The statement B replaces the shost->transportt point with a local struct variable. static struct iscsi_cls_session * iscsi_tcp_session_create(struct iscsi_endpoint ep, uint16_t cmds_max, uint16_t qdepth, uint32_t initial_cmdsn, uint32_t hostno) { struct iscsi_cls_session cls_session; struct iscsi_session session; struct Scsi_Host shost; int cmd_i; if (ep) { printk(KERN_ERR "iscsi_tcp: invalid ep %p.\n", ep); return NULL; } A shost = iscsi_host_alloc(&iscsi_sht, 0, qdepth); if (!shost) return NULL; B shost->transportt = iscsi_tcp_scsi_transport; shost->max_lun = iscsi_max_lun; Please note the scsi host is allocated by invoking isccsi_host_alloc() in libiscsi.c Polluting the middle level blank_transport_template in iscsi_host_alloc() of libiscsi.c The iscsi_host_alloc() invokes the middle level function scsi_host_alloc() in hosts.c for allocating a scsi_host. Then the statement marked with C assigns the iscsi_eh_cmd_timed_out handler to the eh_timed_out callback function. struct Scsi_Host iscsi_host_alloc(struct scsi_host_template sht, int dd_data_size, uint16_t qdepth) { struct Scsi_Host shost; struct iscsi_host ihost; shost = scsi_host_alloc(sht, sizeof(struct iscsi_host) + dd_data_size); if (!shost) return NULL; C shost->transportt->eh_timed_out = iscsi_eh_cmd_timed_out; Please note the shost->transport is the middle level blank_transport_template as shown in the code segment below. We see two problems here. 1. iscsi_eh_cmd_timed_out is installed to the blank_transport_template that will cause some body else problem. 2. iscsi_eh_cmd_timed_out will never be invoked when iscsi command gets timeout because the statement B resets the pointer. Middle level blank_transport_template In the middle level function scsi_host_alloc() of hosts.c, the middle level assigns a blank_transport_template for those hosts not implementing its transport layer. All HBAs without supporting a specific scsi_transport will share the middle level blank_transport_template. Please see the statement D struct Scsi_Host scsi_host_alloc(struct scsi_host_template sht, int privsize) { struct Scsi_Host shost; gfp_t gfp_mask = GFP_KERNEL; int rval; if (sht->unchecked_isa_dma && privsize) gfp_mask \|= __GFP_DMA; shost = kzalloc(sizeof(struct Scsi_Host) + privsize, gfp_mask); if (!shost) return NULL; shost->host_lock = &shost->default_lock; spin_lock_init(shost->host_lock); shost->shost_state = SHOST_CREATED; INIT_LIST_HEAD(&shost->__devices); INIT_LIST_HEAD(&shost->__targets); INIT_LIST_HEAD(&shost->eh_cmd_q); INIT_LIST_HEAD(&shost->starved_list); init_waitqueue_head(&shost->host_wait); mutex_init(&shost->scan_mutex); shost->host_no = scsi_host_next_hn++; /* XXX(hch): still racy / shost->dma_channel = 0xff; / These three are default values which can be overridden / shost->max_channel = 0; shost->max_id = 8; shost->max_lun = 8; / Give each shost a default transportt / D shost->transportt = &blank_transport_template; Why we see panic at iscsi_eh_cmd_timed_out() The mpp virtual HBA doesn’t have a specific scsi_transport. Therefore, the blank_transport_template will be assigned to the virtual host of the MPP virtual HBA by SCSI middle level. Please note that the statement C has assigned iscsi-transport eh_timedout handler to the blank_transport_template. When a mpp virtual command gets timedout, the iscsi_eh_cmd_timed_out() will be invoked to handle mpp virtual command timeout from the middle level scsi_times_out() function of the scsi_error.c. enum blk_eh_timer_return scsi_times_out(struct request req) { struct scsi_cmnd scmd = req->special; enum blk_eh_timer_return (eh_timed_out)(struct scsi_cmnd ); enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED; scsi_log_completion(scmd, TIMEOUT_ERROR); if (scmd->device->host->transportt->eh_timed_out) E eh_timed_out = scmd->device->host->transportt->eh_timed_out; else if (scmd->device->host->hostt->eh_timed_out) eh_timed_out = scmd->device->host->hostt->eh_timed_out; else eh_timed_out = NULL; if (eh_timed_out) { rtn = eh_timed_out(scmd); It is very easy to understand why we get panic in the iscsi_eh_cmd_timed_out(). A scsi_cmnd from a no-iscsi device definitely can not resolve out a session and session->lock. The panic can be happed anywhere during the differencing. static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd scmd) { struct iscsi_cls_session cls_session; struct iscsi_session session; struct iscsi_conn *conn; enum blk_eh_timer_return rc = BLK_EH_NOT_HANDLED; cls_session = starget_to_session(scsi_target(scmd->device)); session = cls_session->dd_data; debug_scsi("scsi cmd %p timedout\n", scmd); spin_lock(&session->lock); This patch fixes the problem by moving the setting of the iscsi_eh_cmd_timed_out to iscsi_add_host, which is after the LLDs have set their transport template to shost->transportt. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:19 -05:00
Shyam_Iyer@Dell.com	7f977ddd0e	[SCSI] qla2xxx: fix Kernel Panic with Qlogic 2472 Card. Kernel Panic is observed with a Qlogic 2472 Card is plugged into the system and the qla2xxx driver is loaded: QLogic Fibre Channel HBA Driver: 8.02.01.02.11.0-k9 vendor=8086 device=3410 qla2xxx 0000:05:00.0: PCI INT A -> GSI 40 (level, low) -> IRQ 40 qla2xxx 0000:05:00.0: Found an ISP2432, irq 40, iobase 0xffffc2001091c000 qla2xxx 0000:05:00.0: Configuring PCI space... qla2xxx 0000:05:00.0: setting latency timer to 64 qla2xxx 0000:05:00.0: Configure NVRAM parameters... BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: [<ffffffff8036319a>] strncpy+0x5/0x1e PGD 7c564067 PUD 78d8c067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb6/6-2/6-2:1.1/input/input4/event 4/dev CPU 1 Modules linked in: qla2xxx(+) squashfs usb_storage scsi_transport_fc scsi_tgt parport_pc parport arc4 ecb crypto_blkcipher acpi_cpufreq fan loop nfs nfs_acl lockd sunrpc nls_iso8859_1 nls_cp437 ipv6 af_packet st sr_mod ide_disk ide_cd_mod ide_core cdrom usbhid hid ff_memless sg sd_mod crc_t10dif uhci_hcd mptsas mptscsih ehci_hcd mptbase scsi_transport_sas rtc_cmos rtc_core rtc_lib usbcore scsi_mod thermal bnx2 button processor thermal_sys hwmon edd Supported: Yes Pid: 4415, comm: insmod Not tainted 2.6.27.13-1-default #1 RIP: 0010:[<ffffffff8036319a>] [<ffffffff8036319a>] strncpy+0x5/0x1e RSP: 0018:ffff88007b04fbc0 EFLAGS: 00010202 RAX: 00000000000000b7 RBX: ffff88007b9641e0 RCX: ffff88007c1b2ad7 RDX: 000000000000004f RSI: 0000000000000000 RDI: ffff88007c1b2ad7 RBP: ffff88007c1b0620 R08: 0000000000000010 R09: 0000000100000000 R10: 0000000000000046 R11: ffffffff803651c6 R12: ffff88007b074000 R13: ffff88007b964000 R14: ffff88007c1b2ac6 R15: 0000000000000000 FS: 00007f91a6c366f0(0000) GS:ffff88007dbeee40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000007bd7c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process insmod (pid: 4415, threadinfo ffff88007b04e000, task ffff880078586180) Stack: ffffffffa02d82c4 0000000000002432 ffff88007d385000 ffff88007c1b0620 ffff88007c1b0620 ffff88007c1b0000 ffff88007d385000 0000000000002432 ffffffffa02dcb1e 0000000000002432 ffffc2001091c000 ffff88007c1b0620 Call Trace: [<ffffffffa02d82c4>] qla24xx_nvram_config+0x385/0x6c2 [qla2xxx] [<ffffffffa02dcb1e>] qla2x00_initialize_adapter+0x169/0x383 [qla2xxx] [<ffffffffa02f2040>] qla2x00_probe_one+0x6bc/0x9c6 [qla2xxx] [<ffffffff8037346f>] pci_device_probe+0xb8/0x105 [<ffffffff803e5a27>] really_probe+0xdd/0x1e5 [<ffffffff803e5c14>] __driver_attach+0x46/0x6d [<ffffffff803e51e1>] bus_for_each_dev+0x44/0x78 [<ffffffff803e4ac7>] bus_add_driver+0xef/0x235 [<ffffffff803e5dd8>] driver_register+0xa2/0x11f [<ffffffff803736fd>] __pci_register_driver+0x5d/0x90 [<ffffffffa0308126>] qla2x00_module_init+0x126/0x159 [qla2xxx] [<ffffffff80209041>] _stext+0x41/0x110 [<ffffffff80260abd>] sys_init_module+0xa0/0x1ba [<ffffffff8020bfbb>] system_call_fastpath+0x16/0x1b [<00007f91a679b76a>] 0x7f91a679b76a Code: ff c1 41 39 c0 75 05 45 85 c0 75 bf 41 29 c0 44 89 c0 c3 31 d2 8a 04 16 88 04 17 48 ff c2 84 c0 75 f3 48 89 f8 c3 48 89 f9 eb 10 <8a> 06 3c 01 88 01 48 83 de ff 48 ff c1 48 ff ca 48 85 d2 75 eb RIP [<ffffffff8036319a>] strncpy+0x5/0x1e RSP <ffff88007b04fbc0> CR2: 0000000000000000 ---[ end trace 829d7d78dfafb785 ]--- The attached patch fixes the issue. Signed-off-by: Shyam Iyer <shyam_iyer@dell.com> Acked-by: Seokmann Ju <Seokmann.ju@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:18 -05:00
Brian King	14ae6faca1	[SCSI] ibmvfc: Increase cancel timeout During cancel testing it has been shown that 15 seconds is not nearly long enough for the VIOS to respond to a cancel under loaded situations. Increasing this timeout to 60 seconds allows time for the VIOS to cancel the outstanding commands and prevents us from escalating to a full host reset, which can take much longer. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:18 -05:00
Brian King	0883e3b3a8	[SCSI] ibmvfc: Fix rport relogin The ibmvfc driver has a bug in its SCN handling. If it receives an ELS event such asn an N-Port SCN event or an unsolicited PLOGI, or any other SCN event which causes ibmvfc_reinit_host to be called, it is possible that we will call fc_remote_port_add for a target that already has an rport added, which can result in duplicate rports getting created for the same targets. Fix this by calling fc_remote_port_rolechg in this scenario instead to report any possible role change that may have occurred. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:18 -05:00
Brian King	d4b17a20f3	[SCSI] ibmvfc: Fix command timeout errors Currently the ibmvfc driver sets the IBMVFC_CLASS_3_ERR flag in the VFC Frame if both the adapter and the device claim support for Class 3. However, this bit actually refers to Class 3 Error Recovery, which is currently not supported by the VIOS. Setting this bit can cause lots of command timeout responses from the VIOS resulting in general instability. Fix this by never setting this bit. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:17 -05:00
Martin Peschke	76e3a19d06	[SCSI] sg: fix device number in blktrace data Hi, we have run into an issue with blktrace being started for sg devices. Please apply. Thanks, Martin From: Martin Peschke <mpeschke@linux.vnet.ibm.com> The device number denoting a generic SCSI devices (sg) in a blktrace trace is broken; major and minor are always 0. It looks like sdp->device->sdev_gendev.devt is not initialized properly. The fix below uses other data to make up a valid device number, similar to the way an sg device number is generated for sysfs output. Reported-by: Stefan Raspl <raspl@linux.vnet.ibm.com> Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:17 -05:00
James Smart	c2f9e49f9b	[SCSI] scsi_scan: add missing interim SDEV_DEL state if slave_alloc fails We were running i/o and performing a bunch of hba resets in a loop. This forces a lot of target removes and then rescans. Since the resets are occuring during scan it's causing the scan i/o to timeout, invoking error recovery, etc. We end up getting some nasty crashing in scsi_scan.c due to references to old sdevs that are failing but had some lingering references that kept them around. Fix by setting device state to SDEV_DEL if the LLD's slave_alloc fails. Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:17 -05:00
Robert Jennings	e637d55319	[SCSI] ibmvscsi: Correct DMA mapping leak The ibmvscsi client driver is not unmapping the SCSI command after encountering a DMA mapping error while trying to map an indirect scattergather list for the event pool. This leads to a leak of DMA entitlement that could result in the device failing future DMA operations in a CMO environment. Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-10 11:15:17 -05:00
Brian King	64b840dd88	[SCSI] ibmvfc: Fix DMA mapping leak on memory allocation failure There is currently a DMA mapping leak that can occur in the ibmvfc driver if we fail to allocate a scatterlist. Fix this by unmapping the scatterlist in the failure path. Additionally, only log an error for a scatterlist allocation failure if the log level is greater than the default, since this can occur when running Active Memory Sharing and this is not considered an error. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-25 08:14:53 -06:00
Andrew Vasquez	f9932deb99	[SCSI] qla2xxx: Update version number to 8.03.00-k2. Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-25 07:59:31 -06:00
Seokmann Ju	3c01b4f9fb	[SCSI] qla2xxx: Add checks for a valid fcport in dev-loss-tmo/terminate_rport_io callbacks. Commit `f78badb1ae` ([SCSI] fc transport: pre-emptively terminate i/o upon dev_loss_tmo timeout) changed the callback semantics of dev_loss_tmo and terminate_rport_io such that repeated calls could be made. This could result in the the driver using stale (NULLed-out, in dev_loss_tmo) data from the rport. Correct this by addint a simple check to ensure a valid fcport is attached. Signed-off-by: Seokmann Ju <seokmann.ju@qlogic.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-25 07:59:12 -06:00
Andrew Vasquez	53303c42d5	[SCSI] qla2xxx: Correct regression in DMA-mask setting prior to allocations. Jeremy Higdon noted (http://marc.info/?l=linux-scsi&m=123262143131788&w=2) that the rework done in commit `e315cd28b9` was not setting the proper consistent and streaming DMA masks prior to memory allocations. Correct this and remove the unnecessary prototype. Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-25 07:58:55 -06:00
Joe Carnuccio	b872ca4081	[SCSI] qla2xxx: Correct descriptions in flash manipulation routines. When clearing the flash device's SR, the comment is incorrect... clearing the SR is 2 steps: 1. the SR protect bit is 1, so the first write zero clears only that bit, 2. the SR protect bit is now 0, so the next write zero clears the remaining bits. The sector erase debug print more correctly identifies that the erase failed. Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-25 07:58:37 -06:00

1 2 3 4 5 ...

4821 Commits