Commit Graph

1393 Commits

Author SHA1 Message Date
Sagi Grimberg
f5ebec9629 Target/iser: Wait for proper cleanup before unloading
disconnected_handler works are scheduled on system_wq.
When attempting to unload, first make sure all works
have cleaned up.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-20 11:18:43 -07:00
Sagi Grimberg
88c4015fda Target/iser: Improve cm events handling
There are 4 RDMA_CM events that all basically mean that
the user should teardown the IB connection:
- DISCONNECTED
- ADDR_CHANGE
- DEVICE_REMOVAL
- TIMEWAIT_EXIT

Only in DISCONNECTED/ADDR_CHANGE it makes sense to
call rdma_disconnect (send DREQ/DREP to our initiator).
So we keep the same teardown handler for all of them
but only indicate calling rdma_disconnect for the relevant
events.

This patch also removes redundant debug prints for each single
event.

v2 changes:
 - Call isert_disconnected_handler() for DEVICE_REMOVAL (Or + Sag)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-20 11:17:40 -07:00
Bart Van Assche
5cfb17828d IB/srp: Add fast registration support
Certain HCA types (e.g. Connect-IB) and certain configurations (e.g.
ConnectX VF) support fast registration but not FMR. Hence add fast
registration support.

In function srp_rport_reconnect(), move the the srp_finish_req()
loop from after to before the srp_create_target_ib() call. This is
needed to avoid that srp_finish_req() tries to queue any
invalidation requests for rkeys associated with the old queue pair
on the newly allocated queue pair. Invoking srp_finish_req() before
the queue pair has been reallocated is safe since srp_claim_req()
handles completions correctly that arrive after srp_finish_req()
has been invoked.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
52ede08f00 IB/srp: Rename FMR-related variables
The next patch will cause the renamed variables to be shared between
the code for FMR and for FR memory registration. Make the names of
these variables independent of the memory registration mode. This
patch does not change any functionality. The start of this patch was
the changes applied via the following shell command:

sed -i.orig 's/SRP_FMR_SIZE/SRP_MAX_PAGES_PER_MR/g; \
    s/fmr_page_mask/mr_page_mask/g;s/fmr_page_size/mr_page_size/g; \
    s/fmr_page_shift/mr_page_shift/g;s/fmr_max_size/mr_max_size/g; \
    s/max_pages_per_fmr/max_pages_per_mr/g;s/nfmr/nmdesc/g; \
    s/fmr_len/dma_len/g' drivers/infiniband/ulp/srp/ib_srp.[ch]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
d1b4289e16 IB/srp: One FMR pool per SRP connection
Allocate one FMR pool per SRP connection instead of one SRP pool
per HCA. This improves scalability of the SRP initiator.

Only request the SCSI mid-layer to retry a SCSI command after a
temporary mapping failure (-ENOMEM) but not after a permanent
mapping failure. This avoids that SCSI commands are retried
indefinitely if a permanent memory mapping failure occurs.

Tell the SCSI mid-layer to reduce queue depth temporarily in the
unlikely case where an application is queuing many requests with
more than max_pages_per_fmr sg-list elements.

For FMR pool allocation, base the max_pages_per_fmr parameter on
the HCA memory registration limit. Only try to allocate an FMR
pool if FMR is supported.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
b1b8854d16 IB/srp: Introduce the 'register_always' kernel module parameter
Add a kernel module parameter that enables memory registration also for SG-lists
that can be processed without memory registration. This makes it easier for kernel
developers to test the memory registration code.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
539dde6fc5 IB/srp: Introduce srp_finish_mapping()
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
76bc1e1ddd IB/srp: Introduce srp_map_fmr()
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:51 -07:00
Bart Van Assche
62154b2e89 IB/srp: Introduce an additional local variable
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:51 -07:00
Bart Van Assche
af24663bc8 IB/srp: Fix kernel-doc warnings
Avoid that the kernel-doc tool warns about missing argument
descriptions for the ib_srp.[ch] source files.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:51 -07:00
Bart Van Assche
024ca90151 IB/srp: Fix a sporadic crash triggered by cable pulling
Avoid that the loops that iterate over the request ring can encounter
a pointer to a SCSI command in req->scmnd that is no longer associated
with that request. If the function srp_unmap_data() is invoked twice
for a SCSI command that is not in flight then that would cause
ib_fmr_pool_unmap() to be invoked with an invalid pointer as argument,
resulting in a kernel oops.

Reported-by: Sagi Grimberg <sagig@mellanox.com>
Reference: http://thread.gmane.org/gmane.linux.drivers.rdma/19068/focus=19069
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:51 -07:00
Sagi Grimberg
9d49f5e284 Target/iser: Fix hangs in connection teardown
In ungraceful teardowns isert close flows seem racy such that
isert_wait_conn hangs as RDMA_CM_EVENT_DISCONNECTED never
gets invoked (no one called rdma_disconnect).

Both graceful and ungraceful teardowns will have rx flush errors
(isert posts a batch once connection is established). Once all
flush errors are consumed we invoke isert_wait_conn and it will
be responsible for calling rdma_disconnect. This way it can be
sure that rdma_disconnect was called and it won't wait forever.

This patch also removes the logout_posted indicator. either the
logout completion was consumed and no problem decrementing the
post_send_buf_count, or it was consumed as a flush error. no point
of keeping it for isert_wait_conn as there is no danger that
isert_conn will be accidentally removed while it is running.

(Drop unnecessary sleep_on_conn_wait_comp check in
 isert_cq_rx_comp_err - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-19 14:38:49 -07:00
Sagi Grimberg
e346ab343f Target/iser: Bail from accept_np if np_thread is trying to close
In case np_thread state is in RESET/SHUTDOWN/EXIT states,
no point for isert to stall there as we may get a hang in
case no one will wake it up later.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-19 14:32:58 -07:00
Sagi Grimberg
14f4b54fe3 Target/iscsi,iser: Avoid accepting transport connections during stop stage
When the target is in stop stage, iSER transport initiates RDMA disconnects.
The iSER initiator may wish to establish a new connection over the
still existing network portal. In this case iSER transport should not
accept and resume new RDMA connections. In order to learn that, iscsi_np
is added with enabled flag so the iSER transport can check when deciding
weather to accept and resume a new connection request.

The iscsi_np is enabled after successful transport setup, and disabled
before iscsi_np login threads are cleaned up.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-15 17:09:11 -07:00
Sagi Grimberg
531b7bf4bd Target/iser: Fix iscsit_accept_np and rdma_cm racy flow
RDMA CM and iSCSI target flows are asynchronous and completely
uncorrelated. Relying on the fact that iscsi_accept_np will be called
after CM connection request event and will wait for it is a mistake.

When attempting to login to a few targets this flow is racy and
unpredictable, but for parallel login to dozens of targets will
race and hang every time.

The correct synchronizing mechanism in this case is pending on
a semaphore rather than a wait_for_event. We keep the pending
interruptible for iscsi_np cleanup stage.

(Squash patch to remove dead code into parent - nab)

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-15 17:09:10 -07:00
Sagi Grimberg
9fe63c88b1 Target/iser: Fix wrong connection requests list addition
Should be adding list_add_tail($new, $head) and not
the other way around.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-15 17:09:10 -07:00
Wilfried Klaebe
7ad24ea4bf net: get rid of SET_ETHTOOL_OPS
net: get rid of SET_ETHTOOL_OPS

Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.

Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
-       SET_ETHTOOL_OPS(dev, ops);
+       dev->ethtool_ops = ops;

Compile tested only, but I'd seriously wonder if this broke anything.

Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 17:43:20 -04:00
Linus Torvalds
141eaccd01 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Here are the target pending updates for v3.15-rc1.  Apologies in
  advance for waiting until the second to last day of the merge window
  to send these out.

  The highlights this round include:

   - iser-target support for T10 PI (DIF) offloads (Sagi + Or)
   - Fix Task Aborted Status (TAS) handling in target-core (Alex Leung)
   - Pass in transport supported PI at session initialization (Sagi + MKP + nab)
   - Add WRITE_INSERT + READ_STRIP T10 PI support in target-core (nab + Sagi)
   - Fix iscsi-target ERL=2 ASYNC_EVENT connection pointer bug (nab)
   - Fix tcm_fc use-after-free of ft_tpg (Andy Grover)
   - Use correct ib_sg_dma primitives in ib_isert (Mike Marciniszyn)

  Also, note the virtio-scsi + vhost-scsi changes to expose T10 PI
  metadata into KVM guest have been left-out for now, as there where a
  few comments from MST + Paolo that where not able to be addressed in
  time for v3.15.  Please expect this feature for v3.16-rc1"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (43 commits)
  ib_srpt: Use correct ib_sg_dma primitives
  target/tcm_fc: Rename ft_tport_create to ft_tport_get
  target/tcm_fc: Rename ft_{add,del}_lport to {add,del}_wwn
  target/tcm_fc: Rename structs and list members for clarity
  target/tcm_fc: Limit to 1 TPG per wwn
  target/tcm_fc: Don't export ft_lport_list
  target/tcm_fc: Fix use-after-free of ft_tpg
  target: Add check to prevent Abort Task from aborting itself
  target: Enable READ_STRIP emulation in target_complete_ok_work
  target/sbc: Add sbc_dif_read_strip software emulation
  target: Enable WRITE_INSERT emulation in target_execute_cmd
  target/sbc: Add sbc_dif_generate software emulation
  target/sbc: Only expose PI read_cap16 bits when supported by fabric
  target/spc: Only expose PI mode page bits when supported by fabric
  target/spc: Only expose PI inquiry bits when supported by fabric
  target: Pass in transport supported PI at session initialization
  target/iblock: Fix double bioset_integrity_free bug
  Target/sbc: Initialize COMPARE_AND_WRITE write_sg scatterlist
  target/rd: T10-Dif: RAM disk is allocating more space than required.
  iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug
  ...
2014-04-12 16:51:08 -07:00
Mike Marciniszyn
b076808051 ib_srpt: Use correct ib_sg_dma primitives
The code was incorrectly using sg_dma_address() and
sg_dma_len() instead of ib_sg_dma_address() and
ib_sg_dma_len().

This prevents srpt from functioning with the
Intel HCA and indeed will corrupt memory
badly.

Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Tested-by: Vinod Kumar <vinod.kumar@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-11 15:31:20 -07:00
Nicholas Bellinger
e70beee783 target: Pass in transport supported PI at session initialization
In order to support local WRITE_INSERT + READ_STRIP operations for
non PI enabled fabrics, the fabric driver needs to be able signal
what protection offload operations are supported.

This is done at session initialization time so the modes can be
signaled by individual se_wwn + se_portal_group endpoints, as well
as optionally across different transports on the same endpoint.

For iser-target, set TARGET_PROT_ALL if the underlying ib_device
has already signaled PI offload support, and allow this to be
exposed via a new iscsit_transport->iscsit_get_sup_prot_ops()
callback.

For loopback, set TARGET_PROT_ALL to signal SCSI initiator mode
operation.

For all other drivers, set TARGET_PROT_NORMAL to disable fabric
level PI.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:54 -07:00
Sagi Grimberg
f225225848 Target/iser: Use Fastreg only if device supports signature
Fastreg is mandatory for signature, so if the
device doesn't support it we don't need to use it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:52 -07:00
Nicholas Bellinger
03e7848a64 iser-target: Add missing se_cmd put for WRITE_PENDING in tx_comp_err
This patch fixes a bug where outstanding RDMA_READs with WRITE_PENDING
status require an extra target_put_sess_cmd() in isert_put_cmd() code
when called from isert_cq_tx_comp_err() + isert_cq_drain_comp_llist()
context during session shutdown.

The extra kref PUT is required so that transport_generic_free_cmd()
invokes the last target_put_sess_cmd() -> target_release_cmd_kref(),
which will complete(&se_cmd->cmd_wait_comp) the outstanding se_cmd
descriptor with WRITE_PENDING status, and awake the completion in
target_wait_for_sess_cmds() to invoke TFO->release_cmd().

The bug was manifesting itself in target_wait_for_sess_cmds() where
a se_cmd descriptor with WRITE_PENDING status would end up sleeping
indefinately.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:51 -07:00
Nicholas Bellinger
131e6abc67 target: Add TFO->abort_task for aborted task resources release
Now that TASK_ABORTED status is not generated for all cases by
TMR ABORT_TASK + LUN_RESET, a new TFO->abort_task() caller is
necessary in order to give fabric drivers a chance to unmap
hardware / software resources before the se_cmd descriptor is
released via the normal TFO->release_cmd() codepath.

This patch adds TFO->aborted_task() in core_tmr_abort_task()
in place of the original transport_send_task_abort(), and
also updates all fabric drivers to implement this caller.

The fabric drivers that include changes to perform cleanup
via ->aborted_task() are:

  - iscsi-target
  - iser-target
  - srpt
  - tcm_qla2xxx

The fabric drivers that currently set ->aborted_task() to
NOPs are:

  - loopback
  - tcm_fc
  - usb-gadget
  - sbp-target
  - vhost-scsi

For the latter five, there appears to be no additional cleanup
required before invoking TFO->release_cmd() to release the
se_cmd descriptor.

v2 changes:
  - Move ->aborted_task() call into transport_cmd_finish_abort (Alex)

Cc: Alex Leung <amleung21@yahoo.com>
Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Vu Pham <vu@mellanox.com>
Cc: Chris Boot <bootc@bootc.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Cc: Saurav Kashyap <saurav.kashyap@qlogic.com>
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:51 -07:00
Nicholas Bellinger
f46d6a8a01 iser-target: Match FRMR descriptors to available session tags
This patch changes isert_conn_create_fastreg_pool() to follow
logic in iscsi_target_locate_portal() for determining how many
FRMR descriptors to allocate based upon the number of possible
per-session command slots that are available.

This addresses an OOPs in isert_reg_rdma() where due to the
use of ISCSI_DEF_XMIT_CMDS_MAX could end up returning a bogus
fast_reg_descriptor when the number of active tags exceeded
the original hardcoded max.

Note this also includes moving isert_conn_create_fastreg_pool()
from isert_connect_request() to isert_put_login_tx() before
posting the final Login Response PDU in order to determine the
se_nacl->queue_depth (eg: number of tags) per session the target
will be enforcing.

v2 changes:
  - Move isert_conn->conn_fr_pool list_head init into
    isert_conn_request()
v3 changes:
  - Drop unnecessary list_empty() check in isert_reg_rdma()
    (Sagi)

Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:50 -07:00
Sagi Grimberg
5bac4b1a1f Target/iser: Fail SCSI WRITE command if device detected integrity error
If during data-transfer a data-integrity error was detected we
must fail the command with CHECK_CONDITION and not execute
the command.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:49 -07:00
Sagi Grimberg
96b7973e1c Target/iser: Move check signature status to a function
Remove code duplication from RDMA_READ and RDMA_WRITE
completions that do basically the same check.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:49 -07:00
Sagi Grimberg
897bb2c916 Target/iser: Consider DIF and RDMA_READ completions when calculating post_send counter
If protection is involved, iSER target must wait for
completion of RDMA_READ before sending SCSI response.
So we must consider that when calculating post_send_buf_count
additions, also when processing good/error completions.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:48 -07:00
Sagi Grimberg
c2caa20777 Target/iser: Fix signature work requests accounting
As REG_SIG_MR work request and it's LOCAL_INVALIDATE are
not accounted in post_send_buf_count we must color these
with ISER_FASTREG_LI_WRID in order to process their error
completions when the QP flushes.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:48 -07:00
Sagi Grimberg
9e961ae73c IB/isert: Support T10-PI protected transactions
In case the Target core passed transport T10 protection
operation:

1. Register data buffer (data memory region)
2. Register protection buffer if exsists (prot memory region)
3. Register signature region (signature memory region)
   - use work request IB_WR_REG_SIG_MR
4. Execute RDMA
5. Upon RDMA completion check the signature status
   - if succeeded send good SCSI response
   - if failed send SCSI bad response with appropriate sense buffer

(Fix up compile error in isert_reg_sig_mr, and fix up incorrect
 se_cmd->prot_type -> TARGET_PROT_NORMAL comparision - nab)
(Fix failed sector assignment in isert_completion_rdma_* - Sagi + nab)
(Fix enum assignements for protection type - Sagi)
(Fix devision on 32-bit in isert_completion_rdma_* - Sagi + Fengguang)
(Fix context change for v3.14-rc6 code - nab)
(Fix iscsit_build_rsp_pdu inc_statsn flag usage - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:47 -07:00
Sagi Grimberg
f93f3a70da IB/isert: Accept RDMA_WRITE completions
In case of protected transactions, we will need to check the
protection status of the transaction before sending SCSI response.
So be ready for RDMA_WRITE completions. currently we don't ask
for these completions, but for T10-PI we will.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:46 -07:00
Sagi Grimberg
d3e125dac1 IB/isert: Initialize T10-PI resources
Introduce pi_context to hold relevant RDMA protection resources.
We eliminate data_key_valid boolean and replace it with indicators
container to indicate:
- Is the descriptor protected (registered via signature MR)
- Is the data_mr key valid (can spare LOCAL_INV WR)
- Is the prot_mr key valid (can spare LOCAL_INV WR)
- Is the sig_mr key valid (can spare LOCAL_INV WR)

Upon connection establishment check if network portal is T10-PI
enabled and allocate T10-PI resources if necessary, allocate
signature enabled memory regions and mark connection queue-pair
as signature enabled.

(Fix context change for v3.14-rc6 code - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:46 -07:00
Sagi Grimberg
e3d7e4c30c IB/isert: Introduce isert_map/unmap_data_buf
export map/unmap data buffer to a routine that may
be used in various places in the code and keep the
mapping data in a designated descriptor. Also, let
isert_fast_reg_mr to decide weather to use global
MR or do fast registration.

This commit does not change any functionality.

(Fix context change for v3.14-rc6 code - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:45 -07:00
Linus Torvalds
877f075aac Main batch of InfiniBand/RDMA changes for 3.15:
- The biggest change is core API extensions and mlx5 low-level driver
    support for handling DIF/DIX-style protection information, and the
    addition of PI support to the iSER initiator.  Target support will be
    arriving shortly through the SCSI target tree.
 
  - A nice simplification to the "umem" memory pinning library now that
    we have chained sg lists.  Kudos to Yishai Hadas for realizing our
    code didn't have to be so crazy.
 
  - Another nice simplification to the sg wrappers used by qib, ipath and
    ehca to handle their mapping of memory to adapter.
 
  - The usual batch of fixes to bugs found by static checkers etc. from
    intrepid people like Dan Carpenter and Yann Droneaud.
 
  - A large batch of cxgb4, ocrdma, qib driver updates.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTPYBnAAoJEENa44ZhAt0hGI4P/29eotGwpkANUQE6FQvxCUL2
 CXJtSg52lmYvGJrPK4IhihpbtQmHJz3iXEzlOOWidTw1dJgObR6vFaRymh7+vDLs
 CdzybMcXdasarqTuYeJbFzhkimpwtWWrMy/8Ik/Jj/5glGQ6cUSpdYZzVtFhYNqf
 hCGE8iLi+tuekJJj1htut5D6apXM7udcdc2yLJNOdsSj/VUXt1oqG1x9xAi9R8Tq
 7o8eFSStdlja0EBQ6Hli2zauCSnQkaUtr8h6EAFbcCtvBK8HqsHSc2gfq2ViFUiN
 ztt167oWoQnVkR0qCPL5nVt+CRQHHROprVXvbpcTI3aW61gNIl6OrUUOXefzHXac
 TNi+fdMpiEB/JQ4Z04Jzd1dGCSjYeTqPj4rO4meFjBmxRDdTgZHu7FWwejT1nYJ5
 d2abVdCOT+QWlIlM7m/pjdWJII5OYM+4/jtTayGepEaR4fTUzKtPZPBLNUBDBKE+
 4f92PC8LiuPkwJgb6XT96onPz1bDCOnPSEdwoKUFKPeGUcwgVOM/Wx5NU4Yf7rfg
 RxQwZ7mJXbjCYFlmGGo/0QDy6UEGkIFYlJSzooP+wlK1JvZ5h2M+9QKX2FtwzR+R
 I2kBxcTXWsM/h88R7MkNqbNIllmhssrJwmAE46OneZbfoBOB+JZjb4nLRTu0jEcS
 zn6f16GmJ37BKn2/qYY/
 =Ww6H
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband updates from Roland Dreier:
 "Main batch of InfiniBand/RDMA changes for 3.15:

   - The biggest change is core API extensions and mlx5 low-level driver
     support for handling DIF/DIX-style protection information, and the
     addition of PI support to the iSER initiator.  Target support will
     be arriving shortly through the SCSI target tree.

   - A nice simplification to the "umem" memory pinning library now that
     we have chained sg lists.  Kudos to Yishai Hadas for realizing our
     code didn't have to be so crazy.

   - Another nice simplification to the sg wrappers used by qib, ipath
     and ehca to handle their mapping of memory to adapter.

   - The usual batch of fixes to bugs found by static checkers etc.
     from intrepid people like Dan Carpenter and Yann Droneaud.

   - A large batch of cxgb4, ocrdma, qib driver updates"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (102 commits)
  RDMA/ocrdma: Unregister inet notifier when unloading ocrdma
  RDMA/ocrdma: Fix warnings about pointer <-> integer casts
  RDMA/ocrdma: Code clean-up
  RDMA/ocrdma: Display FW version
  RDMA/ocrdma: Query controller information
  RDMA/ocrdma: Support non-embedded mailbox commands
  RDMA/ocrdma: Handle CQ overrun error
  RDMA/ocrdma: Display proper value for max_mw
  RDMA/ocrdma: Use non-zero tag in SRQ posting
  RDMA/ocrdma: Memory leak fix in ocrdma_dereg_mr()
  RDMA/ocrdma: Increment abi version count
  RDMA/ocrdma: Update version string
  be2net: Add abi version between be2net and ocrdma
  RDMA/ocrdma: ABI versioning between ocrdma and be2net
  RDMA/ocrdma: Allow DPP QP creation
  RDMA/ocrdma: Read ASIC_ID register to select asic_gen
  RDMA/ocrdma: SQ and RQ doorbell offset clean up
  RDMA/ocrdma: EQ full catastrophe avoidance
  RDMA/cxgb4: Disable DSGL use by default
  RDMA/cxgb4: rx_data() needs to hold the ep mutex
  ...
2014-04-03 16:57:19 -07:00
Roland Dreier
f7eaa7ed8f Merge branches 'core', 'cxgb4', 'ip-roce', 'iser', 'misc', 'mlx4', 'nes', 'ocrdma', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next 2014-04-03 08:30:17 -07:00
Or Gerlitz
5de2ad986d IB/iser: Bump driver version to 1.3
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Or Gerlitz
3ee07d27ac IB/iser: Update Mellanox copyright note
Update Mellanox copyrights for 2014 on the iser initiator driver.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Or Gerlitz
4f9208ad3f IB/iser: Print QP information once connection is established
Add an iser info print with the local/remote QP information carried
out when the connection is established.  While here, fix a little
leftover from the T10 work and set a debug print to be carried in
debug and not info level.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Ariel Nahum
4667f5dfb0 IB/iser: Remove struct iscsi_iser_conn
The iscsi stack has existing mechanisms to link back and forth between
the iscsi connection and the iscsi transport (e.g iser/tcp) connection.

This is done through a dd_data pointer field in struct iscsi_conn
which can be set to point to the transport connection, etc.

The iscsi_iser_conn structure was used to get this linking done in
another way, which is uneeded and adds extra complication to the iser
code, so we just remove it.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Roi Dayan
1d6c2b736f IB/iser: Drain the tx cq once before looping on the rx cq
The iser disconnection flow isn't done before all the inflight
recv/send buffers posted to the QP are either flushed or normally
completed to the CQ that serves this connection.  The condition check
is done in iser_handle_comp_error().

Currently, it's possible for the send buffer completion that makes the
posted send buffers counter reach zero to be polled in the drain tx
call, which is after the rx cq is fully drained.  Since this
completion might be not an error one (for example, it might be a
completion of the logout request iSCSI PDU) we will skip
iser_handle_comp_error().  So the connection will never terminate from
the iscsi stack point of view, and we hang.

To resolve this race, do the draining of the tx cq before the loop on
the rx cq.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Randy Dunlap
39c978cd17 IB/iser: Fix sector_t format warning
Fix pr_err (printk) format warning:

    drivers/infiniband/ulp/iser/iser_verbs.c:1181:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'sector_t' [-Wformat]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:01:08 -07:00
Bart Van Assche
b3fe628da2 IB/srp: Fix a race condition between failing I/O and I/O completion
Avoid that srp_terminate_io() can access req->scmnd after it has been
cleared by the I/O completion code. Do this by protecting req->scmnd
accesses from srp_terminate_io() via locking

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:31 -07:00
Bart Van Assche
ac72d766a8 IB/srp: Avoid that writing into "add_target" hangs due to a cable pull
If a cable is pulled while srp_connect_target() is in progress
that can result in that function never to return. That makes the
process, e.g. srp_daemon, that invoked this function unkillable.
Avoid this by letting srp_connect_target() finish if the event
IB_CM_TIMEWAIT_EXIT is received. This patch fixes a hang with the
following call trace:

 [<ffffffff814eae85>] schedule_timeout+0x215/0x2e0
 [<ffffffff814eab03>] wait_for_common+0x123/0x180
 [<ffffffff814eac1d>] wait_for_completion+0x1d/0x20
 [<ffffffffa03b398c>] srp_connect_target+0x1dc/0x410 [ib_srp]
 [<ffffffffa03b5809>] srp_create_target+0xba9/0xe70 [ib_srp]
 [<ffffffff8133e590>] dev_attr_store+0x20/0x30
 [<ffffffff811eb8f5>] sysfs_write_file+0xe5/0x170
 [<ffffffff811767c8>] vfs_write+0xb8/0x1a0
 [<ffffffff811770c1>] sys_write+0x51/0x90
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:31 -07:00
Bart Van Assche
a702adceec IB/srp: Make writing into the "add_target" sysfs attribute interruptible
Avoid that stopping srp_daemon takes unusually long due to a cable
pull by making writing into the "add_target" sysfs attribute
interruptible.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:31 -07:00
Bart Van Assche
2d7091bcf6 IB/srp: Avoid duplicate connections
The connection uniqueness check is performed before a new connection
is added to the target list. This patch protects both actions by a
mutex such that simultaneous writes from two different threads into the
"add_target" variable do not result in duplicate connections.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:31 -07:00
Bart Van Assche
e7ffde0164 IB/srp: Add more logging
Log sgid and dgid when reporting that a login has been rejected or when
a host has been added. This makes it easy to figure out which initiator
and target ports these messages apply to.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:30 -07:00
Sagi Grimberg
2088ca66f5 IB/srp: Check ib_query_gid return value
Detected by Coverity.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:30 -07:00
Sagi Grimberg
6192d4e6bb IB/iser: Publish T10-PI support to SCSI midlayer
After allocating a scsi_host we set protection types and guard type
supported.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Sagi Grimberg
0a7a08ad6f IB/iser: Implement check_protection
Once the iSCSI transaction is completed we must implement
check_protection in order to notify on DIF errors that may have
occured.

The routine boils down to calling ib_check_mr_status to get the
signature status of the transaction.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Sagi Grimberg
177e31bd5a IB/iser: Support T10-PI operations
Add logic to initialize protection information entities.  Upon each
iSCSI task, we keep the scsi_cmnd in order to query the scsi
protection operations and reference to protection buffers.

Modify iser_fast_reg_mr to receive indication whether it is
registering the data or protection buffers.

In addition introduce iser_reg_sig_mr which performs fast registration
work-request for a signature enabled memory region
(IB_WR_REG_SIG_MR).  In this routine we set all the protection
relevants for the device to offload protection data-transfer and
verification.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Alex Tabachnik
6b5a8fb0d2 IB/iser: Initialize T10-PI resources
During connection establishment we also initialize T10-PI resources
(QP, PI contexts) in order to support SCSI's protection operations.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Alex Tabachnik
7f73384752 IB/iser: Introduce pi_enable, pi_guard module parameters
Use modparams to activate protection information support.

pi_enable bool: Based on this parameter iSER will know if it should
support T10-PI.  We don't want to do this by default as it requires to
allocate and initialize extra resources.  In case pi_enable=N, iSER
won't publish to SCSI midlayer any DIF capabilities.

pi_guard int: Based on this parameter iSER will publish DIX guard type
support to SCSI midlayer.  0 means CRC is allowed to be passed in DIX
buffers, 1 (or non-zero) means IP-CSUM is allowed to be passed in DIX
buffers.  Note that over the wire, only CRC is allowed.

In the next phase, it is worth considering passing these parameters
from iscsid via nlmsg.  This will allow these parameters to be
connection based rather than global.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Sagi Grimberg
5f588e3d0c IB/iser: Generalize fall_to_bounce_buf routine
Unaligned SG-lists may also happen for protection information.
Generalize bounce buffer routine to handle any iser_data_buf which may
be data and/or protection.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
9a8b08fad2 IB/iser: Generalize iser_unmap_task_data and finalize_rdma_unaligned_sg
This routines operates on data buffers and may also work with
protection infomation buffers.  So we generalize them to handle an
iser_data_buf which can be the command data or command protection
information.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
73bc06b7ed IB/iser: Replace fastreg descriptor valid bool with indicators container
In T10-PI support we will have memory keys for protection buffers and
signature transactions.  We prefer to compact indicators rather than
keeping multiple bools.

This commit does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
65198d6b84 IB/iser: Keep IB device attributes under iser_device
For T10-PI offload support, we will need to know the device signature
offload capability upon every connection establishment.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
310b347c60 IB/iser: Move fast_reg_descriptor initialization to a function
fastreg descriptor will include protection information context.  In
order to place the logic in one place we introduce iser_create_fr_desc
function.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
d11ec4ecf0 IB/iser: Push the decision what memory key to use into fast_reg_mr routine
This is a preparation step for T10-PI offload support.  We prefer to
push the desicion of which mkey to use (global or fastreg) to
iser_fast_reg_mr.  We choose to do this since in T10-PI we may need to
register for protection buffers and in this case we wish to simplify
iser_fast_reg_mr instead of repeating the logic of which key to use.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
7306b8fad4 IB/iser: Avoid FRWR notation, use fastreg instead
FRWR stands for "fast registration work request". We want to avoid
calling the fastreg pool with that name, instead we name it fastreg
which stands for "fast registration".

This pool will include more elements in the future, so it is a good
idea to generalize the name.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
db523b8de1 IB/iser: Suppress completions for fast registration work requests
In case iSER uses fast registration method, it should not request for
successful completions on fast registration nor local invalidate
requests.  We color wr_id with ISER_FRWR_LI_WRID in order to correctly
consume error completions.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:54 -07:00
Linus Torvalds
66a523db70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "This series addresses a number of outstanding issues wrt to active I/O
  shutdown using iser-target.  This includes:

   - Fix a long standing tpg_state bug where a tpg could be referenced
     during explicit shutdown (v3.1+ stable)
   - Use list_del_init for iscsi_cmd->i_conn_node so list_empty checks
     work as expected (v3.10+ stable)
   - Fix a isert_conn->state related hung task bug + ensure outstanding
     I/O completes during session shutdown.  (v3.10+ stable)
   - Fix isert_conn->post_send_buf_count accounting for RDMA READ/WRITEs
     (v3.10+ stable)
   - Ignore FRWR completions during active I/O shutdown (v3.12+ stable)
   - Fix command leakage for interrupt coalescing during active I/O
     shutdown (v3.13+ stable)

  Also included is another DIF emulation fix from Sagi specific to
  v3.14-rc code"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  Target/sbc: Fix sbc_copy_prot for offset scatters
  iser-target: Fix command leak for tx_desc->comp_llnode_batch
  iser-target: Ignore completions for FRWRs in isert_cq_tx_work
  iser-target: Fix post_send_buf_count for RDMA READ/WRITE
  iscsi/iser-target: Fix isert_conn->state hung shutdown issues
  iscsi/iser-target: Use list_del_init for ->i_conn_node
  iscsi-target: Fix iscsit_get_tpg_from_np tpg_state bug
2014-03-09 13:50:14 -07:00
Nicholas Bellinger
ebbe442183 iser-target: Fix command leak for tx_desc->comp_llnode_batch
This patch addresses a number of active I/O shutdown issues
related to isert_cmd descriptors being leaked that are part
of a completion interrupt coalescing batch.

This includes adding logic in isert_cq_tx_comp_err() to
drain any associated tx_desc->comp_llnode_batch, as well
as isert_cq_drain_comp_llist() to drain any associated
isert_conn->conn_comp_llist.

Also, set tx_desc->llnode_active in isert_init_send_wr()
in order to determine when work requests need to be skipped
in isert_cq_tx_work() exception path code.

Finally, update isert_init_send_wr() to only allow interrupt
coalescing when ISER_CONN_UP.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.13+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:10 -08:00
Nicholas Bellinger
9bb4ca68fc iser-target: Ignore completions for FRWRs in isert_cq_tx_work
This patch changes IB_WR_FAST_REG_MR + IB_WR_LOCAL_INV related
work requests to include a ISER_FRWR_LI_WRID value in order to
signal isert_cq_tx_work() that these requests should be ignored.

This is necessary because even though IB_SEND_SIGNALED is not
set for either work request, during a QP failure event the work
requests will be returned with exception status from the TX
completion queue.

v2 changes:
 - Rename ISER_FRWR_LI_WRID -> ISER_FASTREG_LI_WRID (Sagi)

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:10 -08:00
Nicholas Bellinger
b6b87a1df6 iser-target: Fix post_send_buf_count for RDMA READ/WRITE
This patch fixes the incorrect setting of ->post_send_buf_count
related to RDMA WRITEs + READs where isert_rdma_rw->send_wr_num
was not being taken into account.

This includes incrementing ->post_send_buf_count within
isert_put_datain() + isert_get_dataout(), decrementing within
__isert_send_completion() + isert_response_completion(), and
clearing wr->send_wr_num within isert_completion_rdma_read()

This is necessary because even though IB_SEND_SIGNALED is
not set for RDMA WRITEs + READs, during a QP failure event
the work requests will be returned with exception status
from the TX completion queue.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Nicholas Bellinger
defd884845 iscsi/iser-target: Fix isert_conn->state hung shutdown issues
This patch addresses a couple of different hug shutdown issues
related to wait_event() + isert_conn->state.  First, it changes
isert_conn->conn_wait + isert_conn->conn_wait_comp_err from
waitqueues to completions, and sets ISER_CONN_TERMINATING from
within isert_disconnect_work().

Second, it splits isert_free_conn() into isert_wait_conn() that
is called earlier in iscsit_close_connection() to ensure that
all outstanding commands have completed before continuing.

Finally, it breaks isert_cq_comp_err() into seperate TX / RX
related code, and adds logic in isert_cq_rx_comp_err() to wait
for outstanding commands to complete before setting ISER_CONN_DOWN
and calling complete(&isert_conn->conn_wait_comp_err).

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Nicholas Bellinger
5159d763f6 iscsi/iser-target: Use list_del_init for ->i_conn_node
There are a handful of uses of list_empty() for cmd->i_conn_node
within iser-target code that expect to return false once a cmd
has been removed from the per connect list.

This patch changes all uses of list_del -> list_del_init in order
to ensure that list_empty() returns false as expected.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Linus Torvalds
946dd683af Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Mostly minor fixes this time to v3.14-rc1 related changes.  Also
  included is one fix for a free after use regression in persistent
  reservations UNREGISTER logic that is CC'ed to >= v3.11.y stable"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  Target/sbc: Fix protection copy routine
  IB/srpt: replace strict_strtoul() with kstrtoul()
  target: Simplify command completion by removing CMD_T_FAILED flag
  iser-target: Fix leak on failure in isert_conn_create_fastreg_pool
  iscsi-target: Fix SNACK Type 1 + BegRun=0 handling
  target: Fix missing length check in spc_emulate_evpd_83()
  qla2xxx: Remove last vestiges of qla_tgt_cmd.cmd_list
  target: Fix 32-bit + CONFIG_LBDAF=n link error w/ sector_div
  target: Fix free-after-use regression in PR unregister
2014-02-15 16:18:47 -08:00
Dan Carpenter
fd8b48b22a IB/iser: Fix use after free in iser_snd_completion()
We use "tx_desc" again after we free it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:48:09 -08:00
Roi Dayan
7d9eacf945 IB/iser: Avoid dereferencing iscsi_iser conn object when not bound to iser connection
Fix a possible NULL pointer dereference in disconnection flow. This
can happen if the target disconnected/rejected the connection request,
e.g before the binding stage between iscsi connection to the transport
connection.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:48:03 -08:00
Jingoo Han
9d8abf4594 IB/srpt: replace strict_strtoul() with kstrtoul()
The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-02-12 15:14:40 -08:00
Nicholas Bellinger
a80e21b3b2 iser-target: Fix leak on failure in isert_conn_create_fastreg_pool
This patch fixes a memory leak for fr_desc upon failure of
isert_create_fr_desc() in isert_conn_create_fastreg_pool()
code.

As reported by Coverity 1166659:

*** CID 1166659:  Resource leak  (RESOURCE_LEAK)
/drivers/infiniband/ulp/isert/ib_isert.c: 470 in isert_conn_create_fastreg_pool()
464                      isert_conn, isert_conn->conn_fr_pool_size);
465
466             return 0;
467
468     err:
469             isert_conn_free_fastreg_pool(isert_conn);
>>>     CID 1166659:  Resource leak  (RESOURCE_LEAK)
>>>     Variable "fr_desc" going out of scope leaks the storage it points to.
470             return ret;
471     }
472
473     static int
474     isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
475     {

Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-02-12 15:14:11 -08:00
Linus Torvalds
4e13c5d021 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "The highlights this round include:

  - add support for SCSI Referrals (Hannes)
  - add support for T10 DIF into target core (nab + mkp)
  - add support for T10 DIF emulation in FILEIO + RAMDISK backends (Sagi + nab)
  - add support for T10 DIF -> bio_integrity passthrough in IBLOCK backend (nab)
  - prep changes to iser-target for >= v3.15 T10 DIF support (Sagi)
  - add support for qla2xxx N_Port ID Virtualization - NPIV (Saurav + Quinn)
  - allow percpu_ida_alloc() to receive task state bitmask (Kent)
  - fix >= v3.12 iscsi-target session reset hung task regression (nab)
  - fix >= v3.13 percpu_ref se_lun->lun_ref_active race (nab)
  - fix a long-standing network portal creation race (Andy)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
  target: Fix percpu_ref_put race in transport_lun_remove_cmd
  target/iscsi: Fix network portal creation race
  target: Report bad sector in sense data for DIF errors
  iscsi-target: Convert gfp_t parameter to task state bitmask
  iscsi-target: Fix connection reset hang with percpu_ida_alloc
  percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask
  iscsi-target: Pre-allocate more tags to avoid ack starvation
  qla2xxx: Configure NPIV fc_vport via tcm_qla2xxx_npiv_make_lport
  qla2xxx: Enhancements to enable NPIV support for QLOGIC ISPs with TCM/LIO.
  qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure
  IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
  IB/isert: Move fastreg descriptor creation to a function
  IB/isert: Avoid frwr notation, user fastreg
  IB/isert: seperate connection protection domains and dma MRs
  tcm_loop: Enable DIF/DIX modes in SCSI host LLD
  target/rd: Add DIF protection into rd_execute_rw
  target/rd: Add support for protection SGL setup + release
  target/rd: Refactor rd_build_device_space + rd_release_device_space
  target/file: Add DIF protection support to fd_execute_rw
  target/file: Add DIF protection init/format support
  ...
2014-01-31 15:31:23 -08:00
Nicholas Bellinger
676687c696 iscsi-target: Convert gfp_t parameter to task state bitmask
This patch propigates the use of task state bitmask now used by
percpu_ida_alloc() up the iscsi-target callchain, replacing the
use of GFP_ATOMIC for TASK_RUNNING, and GFP_KERNEL for
TASK_INTERRUPTIBLE.

Also, drop the unnecessary gfp_t parameter to isert_allocate_cmd(),
and just pass TASK_INTERRUPTIBLE into iscsit_allocate_cmd().

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-25 06:58:52 +00:00
Roland Dreier
8f399921ea Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next 2014-01-22 23:24:13 -08:00
Michal Schmidt
437708c443 IPoIB: Report operstate consistently when brought up without a link
After booting without a working link, "ip link" shows:

 5: mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast
 state DOWN qlen 256
    ...
 7: mlx4_ib1.8003@mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc
 pfifo_fast state DOWN qlen 256
    ...

Then after connecting and disconnecting the link, which should result
in exactly the same state as before, it shows:

 5: mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast
 state DOWN qlen 256
    ...
 7: mlx4_ib1.8003@mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc
 pfifo_fast state LOWERLAYERDOWN qlen 256
    ...

Notice the (now correct) LOWERLAYERDOWN operstate shown for the
mlx4_ib1.8003 interface. Ideally the identical state would be shown
right after boot.

The problem is related to the calling of netif_carrier_off() in
network drivers.  For a long time it was known that doing
netif_carrier_off() before registering the netdevice would result in
the interface's operstate being shown as UNKNOWN if the device was
brought up without a working link. This problem was fixed in commit
8f4cccbbd9 ('net: Set device operstate at registration time'), but
still there remains the minor inconsistency demonstrated above.

This patch fixes it by moving ipoib's call to netif_carrier_off() into
the .ndo_open method, which is where network drivers ordinarily do it.
With the patch when doing the same test as above, the operstate of
mlx4_ib1.8003 is shown as LOWERLAYERDOWN right after boot.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:01:05 -08:00
Bart Van Assche
93079162bf scsi_transport_srp: Fix a race condition
The rport timers must be stopped before the SRP initiator destroys the
resources associated with the SCSI host. This is necessary because
otherwise the callback functions invoked from the SRP transport layer
could trigger a use-after-free. Stopping the rport timers before
invoking scsi_remove_host() can trigger long delays in the SCSI error
handler if a transport layer failure occurs while scsi_remove_host()
is in progress. Hence move the code for stopping the rport timers from
srp_rport_release() into a new function and invoke that function after
scsi_remove_host() has finished. This patch fixes the following
sporadic kernel crash:

     kernel BUG at include/asm-generic/dma-mapping-common.h:64!
     invalid opcode: 0000 [#1] SMP
     RIP: 0010:[<ffffffffa03b20b1>]  [<ffffffffa03b20b1>] srp_unmap_data+0x121/0x130 [ib_srp]
     Call Trace:
     [<ffffffffa03b20fc>] srp_free_req+0x3c/0x80 [ib_srp]
     [<ffffffffa03b2188>] srp_finish_req+0x48/0x70 [ib_srp]
     [<ffffffffa03b21fb>] srp_terminate_io+0x4b/0x60 [ib_srp]
     [<ffffffffa03a6fb5>] __rport_fail_io_fast+0x75/0x80 [scsi_transport_srp]
     [<ffffffffa03a7438>] rport_fast_io_fail_timedout+0x88/0xc0 [scsi_transport_srp]
     [<ffffffff8108b370>] worker_thread+0x170/0x2a0
     [<ffffffff81090876>] kthread+0x96/0xa0
     [<ffffffff8100c0ca>] child_rip+0xa/0x20

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-21 10:46:17 -08:00
Sagi Grimberg
9bd626e79d IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
This routine may help for protection registration as well.
This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-19 02:22:09 +00:00
Sagi Grimberg
dc87a90f5d IB/isert: Move fastreg descriptor creation to a function
This routine may be called both by fast registration
descriptors for data and for integrity buffers.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-19 02:22:08 +00:00
Sagi Grimberg
a3a5a82627 IB/isert: Avoid frwr notation, user fastreg
Use fast registration lingo. fast registration will
also incorporate signature/DIF registration.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-19 02:22:07 +00:00
Sagi Grimberg
eb6ab13267 IB/isert: seperate connection protection domains and dma MRs
It is more correct to seperate connections protection domains
and dma_mr handles. protection information support requires to
do so.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-19 02:22:07 +00:00
Matan Barak
90f1d1b41b IB/core: Add flow steering support for IPoIB UD traffic
When creating an IPoIB UD QP, provide a hint to the low level driver
that the QP should support flow-steering.  This means that privileged
user space applications can steer TCP/IP IPoIB traffic from the
network stack, in a similar manner done with Ethernet RAW_PACKET QPs.

The hint is provided through new QP creation flag called NETIF_QP.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 14:06:50 -08:00
Hangbin Liu
0d68fc4f12 infiniband: make sure the src net is infiniband when create new link
When we create a new infiniband link with uninfiniband device, e.g. `ip link
add link em1 type ipoib pkey 0x8001`. We will get a NULL pointer dereference
cause other dev like Ethernet don't have struct ib_device.

The code path is:
rtnl_newlink
  |-- ipoib_new_child_link
        |-- __ipoib_vlan_add
              |-- ipoib_set_dev_features
                    |-- ib_query_device

Fix this bug by make sure the src net is infiniband when create new link.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 20:38:56 -05:00
Nicholas Bellinger
2853c2b667 iser-target: Move INIT_WORK setup into isert_create_device_ib_res
This patch moves INIT_WORK setup for cq_desc->cq_[rx,tx]_work into
isert_create_device_ib_res(), instead of being done each callback
invocation in isert_cq_[rx,tx]_callback().

This also fixes a 'INFO: trying to register non-static key' warning
when cancel_work_sync() is called before INIT_WORK has setup the
struct work_struct.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-19 00:18:43 -08:00
Wei Yongjun
94a7111043 iser-target: fix error return code in isert_create_device_ib_res()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-11 11:54:47 -08:00
Linus Torvalds
b0e3636f65 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Things have been quiet this round with mostly bugfixes, percpu
  conversions, and other minor iscsi-target conformance testing changes.

  The highlights include:

   - Add demo_mode_discovery attribute for iscsi-target (Thomas)
   - Convert tcm_fc(FCoE) to use percpu-ida pre-allocation
   - Add send completion interrupt coalescing for ib_isert
   - Convert target-core to use percpu-refcounting for se_lun
   - Fix mutex_trylock usage bug in iscsit_increment_maxcmdsn
   - tcm_loop updates (Hannes)
   - target-core ALUA cleanups + prep for v3.14 SCSI Referrals support (Hannes)

  v3.14 is currently shaping to be a busy development cycle in target
  land, with initial support for T10 Referrals and T10 DIF currently on
  the roadmap"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (40 commits)
  iscsi-target: chap auth shouldn't match username with trailing garbage
  iscsi-target: fix extract_param to handle buffer length corner case
  iscsi-target: Expose default_erl as TPG attribute
  target_core_configfs: split up ALUA supported states
  target_core_alua: Make supported states configurable
  target_core_alua: Store supported ALUA states
  target_core_alua: Rename ALUA_ACCESS_STATE_OPTIMIZED
  target_core_alua: spellcheck
  target core: rename (ex,im)plict -> (ex,im)plicit
  percpu-refcount: Add percpu-refcount.o to obj-y
  iscsi-target: Do not reject non-immediate CmdSNs exceeding MaxCmdSN
  iscsi-target: Convert iscsi_session statistics to atomic_long_t
  target: Convert se_device statistics to atomic_long_t
  target: Fix delayed Task Aborted Status (TAS) handling bug
  iscsi-target: Reject unsupported multi PDU text command sequence
  ib_isert: Avoid duplicate iscsit_increment_maxcmdsn call
  iscsi-target: Fix mutex_trylock usage in iscsit_increment_maxcmdsn
  target: Core does not need blkdev.h
  target: Pass through I/O topology for block backstores
  iser-target: Avoid using FRMR for single dma entry requests
  ...
2013-11-22 10:52:03 -08:00
Linus Torvalds
1ea406c0e0 Main batch of InfiniBand/RDMA changes for 3.13:
- Re-enable flow steering verbs with new improved userspace ABI
  - Fixes for slow connection due to GID lookup scalability
  - IPoIB fixes
  - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
  - Further improvements to SRP error handling
  - Add new transport type for Cisco usNIC
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSil7BAAoJEENa44ZhAt0hbtgP/A+AmUalbOX6ZKzuOFxsrtY2
 r55CX9b1JBeFM/Zhn2o6y+81lpCjkckJSggESMe4izNgocGw0nW4vYGN4SBynatj
 y8sR9OSn+G3ihuENrzG41MJUGEa5WbcNMy4boN+Oa+qyTlV/WjLR7Fv4WbikK7Wm
 o8FNlXiiDhMoGfHHG5J0MD0EQsnxuLDk2XP+ciu4tLtTs+wBka+gFK8WnMvztle3
 gTeMNna5ilvCS2fdBxteuPA3KeDnJE9AgJSMJ2a4Rh+DR8uTgWYQ6n7amjmOc546
 yhAKkoBkxPE10+Yj82WOPhCFxSeWcuSwJvpgv5dTVZ1XqUUcC1V3TEcZDHmyyHQ7
 uPXgS1A+erBW3OYPBjZqtKvnHObscV12fL+rId3vIhcAQIbFroci08ZwPidEYRkn
 fvwlEKcrIsBIpRXEyjlFCxsiiDnfq1wC1VayMR3jrIK0P6idf1SXf/geiRp9+RGT
 wKUc0j51jvEx29qc65xuhEP9FQV9pCMxyd+FEE0d0KkjMz5hsIkjmcUcBbgF0CGg
 GEyDPlgRLv+vmWDGpT8XraaV/0CJOEQDIgB4WSN87/AZ4UoNt7spW2xqsLsp1toy
 5e0100tpWUleTPLe/Wig5GtBdagQ2jAUK1+186CP93pFPtkwc4/7X3hyp7qPIPTz
 VDvT9DEy6zjSMCLpMcdo
 =xxC+
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma updates from Roland Dreier:
 - Re-enable flow steering verbs with new improved userspace ABI
 - Fixes for slow connection due to GID lookup scalability
 - IPoIB fixes
 - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
 - Further improvements to SRP error handling
 - Add new transport type for Cisco usNIC

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
  IB/core: Re-enable create_flow/destroy_flow uverbs
  IB/core: extended command: an improved infrastructure for uverbs commands
  IB/core: Remove ib_uverbs_flow_spec structure from userspace
  IB/core: Use a common header for uverbs flow_specs
  IB/core: Make uverbs flow structure use names like verbs ones
  IB/core: Rename 'flow' structs to match other uverbs structs
  IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
  IB/ucma: Convert use of typedef ctl_table to struct ctl_table
  IB/cm: Convert to using idr_alloc_cyclic()
  IB/mlx5: Fix page shift in create CQ for userspace
  IB/mlx4: Fix device max capabilities check
  IB/mlx5: Fix list_del of empty list
  IB/mlx5: Remove dead code
  IB/core: Encorce MR access rights rules on kernel consumers
  IB/mlx4: Fix endless loop in resize CQ
  RDMA/cma: Remove unused argument and minor dead code
  RDMA/ucma: Discard events for IDs not yet claimed by user space
  IB/core: Add Cisco usNIC rdma node and transport types
  RDMA/nes: Remove self-assignment from nes_query_qp()
  IB/srp: Report receive errors correctly
  ...
2013-11-18 15:36:04 -08:00
Roland Dreier
b4fdf52b3f Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next 2013-11-17 08:22:19 -08:00
Linus Torvalds
9073e1a804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual earth-shaking, news-breaking, rocket science pile from
  trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
  doc: add missing files to timers/00-INDEX
  timekeeping: Fix some trivial typos in comments
  mm: Fix some trivial typos in comments
  irq: Fix some trivial typos in comments
  NUMA: fix typos in Kconfig help text
  mm: update 00-INDEX
  doc: Documentation/DMA-attributes.txt fix typo
  DRM: comment: `halve' -> `half'
  Docs: Kconfig: `devlopers' -> `developers'
  doc: typo on word accounting in kprobes.c in mutliple architectures
  treewide: fix "usefull" typo
  treewide: fix "distingush" typo
  mm/Kconfig: Grammar s/an/a/
  kexec: Typo s/the/then/
  Documentation/kvm: Update cpuid documentation for steal time and pv eoi
  treewide: Fix common typo in "identify"
  __page_to_pfn: Fix typo in comment
  Correct some typos for word frequency
  clk: fixed-factor: Fix a trivial typo
  ...
2013-11-15 16:47:22 -08:00
Nicholas Bellinger
04d9cd1224 ib_isert: Avoid duplicate iscsit_increment_maxcmdsn call
This patch avoids a duplicate iscsit_increment_maxcmdsn() call for
ISER_IB_RDMA_WRITE within isert_map_rdma() + isert_reg_rdma_frwr(),
which will already be occuring once during isert_put_datain() ->
iscsit_build_rsp_pdu() operation.

It also removes the local conn->stat_sn assignment + increment,
and changes the third parameter to iscsit_build_rsp_pdu() to
signal this should be done by iscsi_target_mode code.

Tested-by: Moussa Ba <moussaba@micron.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-12 18:05:07 -08:00
Vu Pham
f01b9f7339 iser-target: Avoid using FRMR for single dma entry requests
This patch changes isert_reg_rdma_frwr() to not use FRMR for single
dma entry requests from small I/Os, in order to avoid the associated
memory registration overhead.

Using DMA MR is sufficient here for the single dma entry requests,
and addresses a >= v3.12 performance regression.

Signed-off-by: Vu Pham <vu@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-12 13:07:52 -08:00
Bart Van Assche
cd4e38542a IB/srp: Report receive errors correctly
The IB spec does not guarantee that the opcode is available in error
completions.  Hence do not rely on it.  See also commit 948d1e889e
("IB/srp: Introduce srp_handle_qp_err()").

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org> # v3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:18 -08:00
Bart Van Assche
99b6697a50 IB/srp: Avoid offlining operational SCSI devices
If SCSI commands are submitted with a SCSI request timeout that is
lower than the the IB RC timeout, it can happen that the SCSI error
handler has already started device recovery before transport layer
error handling starts.  So it can happen that the SCSI error handler
tries to abort a SCSI command after it has been reset by
srp_rport_reconnect().

Tell the SCSI error handler that such commands have finished and that
it is not necessary to continue its recovery strategy for commands
that have been reset by srp_rport_reconnect().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:18 -08:00
Vu Pham
65d7dd2f34 IB/srp: Remove target from list before freeing Scsi_Host structure
Remove an SRP target from the SRP target list before invoking the last
scsi_host_put() call.  This change is necessary because that last put
frees the memory that holds the srp_target_port structure.

This patch prevents the following kernel oops:

    RIP: 0010:[<ffffffff810b00d0>] __lock_acquire+0x500/0x1570
    Call Trace:
     [<ffffffff810b11e4>] lock_acquire+0xa4/0x120
     [<ffffffff81531206>] _spin_lock+0x36/0x70
     [<ffffffffa01b6d8f>] srp_remove_work+0xef/0x180 [ib_srp]
     [<ffffffff8109125c>] worker_thread+0x21c/0x3d0
     [<ffffffff81096e86>] kthread+0x96/0xa0
     [<ffffffff8100c20a>] child_rip+0xa/0x20

Signed-off-by: Vu Pham <vuhuong@mellanox.com>

[ bvanassche - Modified path description and CC'ed stable. ]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:17 -08:00
Jack Wang
71444b9781 IB/srp: Add change_queue_depth and change_queue_type support
Currently, it's not possible to change queue depth for a device behind
SRP host. Sometimes, we need to adjust queue_depth for performance
reason (eg storage busy, we need lower queue_depth to avoid running
into SCSI error handler), so this patch add support for SRP driver.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:17 -08:00
Bart Van Assche
4d73f95f70 IB/srp: Make queue size configurable
Certain storage configurations, e.g. a sufficiently large array of
hard disks in a RAID configuration, need a queue depth above 64 to
achieve optimal performance. Hence make the queue depth configurable.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Tested-by: Jack Wang <xjtuwjp@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:17 -08:00
Bart Van Assche
b81d00bddf IB/srp: Introduce srp_alloc_req_data()
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Vu Pham <vu@mellanox.com>
Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:17 -08:00
Bart Van Assche
848b3082db IB/srp: Export sgid to sysfs
On an initiator system with multiple IB ports it is not yet possible
to figure out what the originating port of an SRP connection is. Hence
make the source GID available in sysfs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:16 -08:00
Bart Van Assche
a95cadb9da IB/srp: Add periodic reconnect functionality
After a transport layer occurred, periodically try to reconnect
to the target until the dev_loss timer expires.  Protect the
callback functions that can be invoked from inside the SCSI EH
against concurrent invocation with srp_reconnect_rport() via the
rport mutex. Change the default dev_loss_tmo from 60s into 600s
to give the reconnect mechanism a chance to kick in.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:16 -08:00
Bart Van Assche
8c64e4531c scsi_transport_srp: Add periodic reconnect support
Add support for periodically reconnecting to an SRP target until
the dev_loss timer expires. After the tenth reconnection attempt,
gradually slow down subsequent reconnect attempts.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:16 -08:00
Bart Van Assche
c1120f8981 IB/srp: Start timers if a transport layer error occurs
Start the reconnect timer, fast_io_fail timer and dev_loss timers if a
transport layer error occurs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:15 -08:00
Bart Van Assche
ed9b2264fb IB/srp: Use SRP transport layer error recovery
Enable fast_io_fail_tmo and dev_loss_tmo functionality for the IB SRP
initiator.  Add kernel module parameters that allow to specify default
values for these parameters.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:15 -08:00
Bart Van Assche
9dd69a600a IB/srp: Keep rport as long as the IB transport layer
Keep the rport data structure around after srp_remove_host() has
finished until cleanup of the IB transport layer has finished
completely. This is necessary because later patches use the rport
pointer inside the queuecommand callback. Without this patch
accessing the rport from inside a queuecommand callback is racy
because srp_remove_host() must be invoked before scsi_remove_host()
and because the queuecommand callback could get invoked after
srp_remove_host() has finished. In other words, without this patch
the queuecommand callback can get invoked after the rport data
structure has been freed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:15 -08:00
Vu Pham
7bb312e4a2 IB/srp: Make transport layer retry count configurable
Allow the InfiniBand RC retry count to be configured by the user as an
option in the target login string.  Reducing this retry count allows to
reduce the path failover time.

Signed-off-by: Vu Pham <vu@mellanox.com>

[ bvanassche: Rewrote patch description / changed default retry count ]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:15 -08:00
Michal Schmidt
7f1a38671c IPoIB: lower NAPI weight
Since commit 82dc3c63c6 ("net: introduce NAPI_POLL_WEIGHT")
netif_napi_add() produces an error message if a NAPI poll weight
greater than 64 is requested.

Use the standard NAPI weight.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:50 -08:00
Erez Shitrit
94232d9ce8 IPoIB: Start multicast join process only on active ports
The driver starts the mcast_join task whenever the netdev interface is
UP without relation to the underlying IB port state.

Until the port state is ACTIVE all the join requests are irrelevant,
and the IB core returns -EINVAL. So the user will see errors such as:
"multicast join failed for ff12:401b:... , status -22".

Instead, have ipoib_mcast_join_task() return when the port is not active.

It will be called again when the port state is changed and the
low-level driver triggers the IB_EVENT_PORT_ACTIVE event or the
IB_EVENT_CLIENT_REREGISTER event.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Erez Shitrit
a39c52ab88 IPoIB: Add path query flushing in ipoib_ib_dev_cleanup
The path_rec_completion() callback may be invoked asynchronously even
at the middle of "driver uninit" process.  This can lead to scheduling
a task that tries to touch members of the priv object that are no
longer valid.  For example the function cm_create_tx_qp can attempt to
create qp with no valid priv->pd object.

The following crash is one of the results:
RIP: 0010:[<ffffffffa021bb47>]  [<ffffffffa021bb47>] ipoib_cm_create_tx_qp+0x57/0x90 [ib_ipoib]
Process ipoib (pid: 5916, threadinfo ffff8803786e4000, task ffff8804150e1500)
Stack:
Call Trace:
[<ffffffff81309ef0>] ? get_random_bytes+0x20/0x30
[<ffffffffa021be2a>] ipoib_cm_tx_init+0xca/0x340 [ib_ipoib]
[<ffffffffa021f765>] ipoib_cm_tx_start+0x215/0x3f0 [ib_ipoib]
[<ffffffffa021f550>] ? ipoib_cm_tx_start+0x0/0x3f0 [ib_ipoib]
[<ffffffff8108b2b0>] worker_thread+0x170/0x2a0
[<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108b140>] ? worker_thread+0x0/0x2a0
[<ffffffff81090886>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff810907f0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20

Fix that by flushing all pending path queries at this point.

Signed-off-by: Alex Markuze <markuze@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Erez Shitrit
a9c8ba5884 IPoIB: Fix usage of uninitialized multicast objects
The driver should avoid calling ib_sa_free_multicast on the mcast->mc
object until it finishes its initialization state.  Otherwise we can
crash when ipoib_mcast_dev_flush() attempts to use the uninitialized
multicast object.

Instead, only call wait_for_completion() for multicast entries that
started the join process, meaning that ib_sa_join_multicast() finished.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Erez Shitrit
aede25011f IPoIB: Avoid flushing the driver workqueue on dev_down
The driver should not flush the whole workqueue when only one work (the
pkey poll one) needs to be cancelled.  Use cancel_delayed_work_sync()
instead.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Erez Shitrit
f47944cc2d IPoIB: Fix deadlock between dev_change_flags() and __ipoib_dev_flush()
When ipoib interface is going down it takes all of its children with
it, under mutex.

For each child, dev_change_flags() is called.  That function calls
ipoib_stop() via the ndo, and causes flush of the workqueue.
Sometimes in the workqueue an __ipoib_dev_flush work() is waiting and
when invoked tries to get the same mutex, which leads to a deadlock,
as seen below.

The solution is to switch to rw-sem instead of mutex.

The deadlock:
[11028.165303]  [<ffffffff812b0977>] ? vgacon_scroll+0x107/0x2e0
[11028.171844]  [<ffffffff814eaac5>] schedule_timeout+0x215/0x2e0
[11028.178465]  [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80
[11028.185962]  [<ffffffff814ea743>] wait_for_common+0x123/0x180
[11028.192491]  [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
[11028.199504]  [<ffffffff814ea85d>] wait_for_completion+0x1d/0x20
[11028.206224]  [<ffffffff8108b4f1>] flush_cpu_workqueue+0x61/0x90
[11028.212948]  [<ffffffff8108b5a0>] ? wq_barrier_func+0x0/0x20
[11028.219375]  [<ffffffff8108bfc4>] flush_workqueue+0x54/0x80
[11028.225712]  [<ffffffffa05a0576>] ipoib_mcast_stop_thread+0x66/0x90 [ib_ipoib]
[11028.233988]  [<ffffffffa059ccea>] ipoib_ib_dev_down+0x6a/0x100 [ib_ipoib]
[11028.241678]  [<ffffffffa059849a>] ipoib_stop+0x8a/0x140 [ib_ipoib]
[11028.248692]  [<ffffffff8142adf1>] dev_close+0x71/0xc0
[11028.254447]  [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0
[11028.261062]  [<ffffffffa059851b>] ipoib_stop+0x10b/0x140 [ib_ipoib]
[11028.268172]  [<ffffffff8142adf1>] dev_close+0x71/0xc0
[11028.273922]  [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0
[11028.280452]  [<ffffffff8148f20b>] devinet_ioctl+0x5eb/0x6a0
[11028.286786]  [<ffffffff814903b8>] inet_ioctl+0x88/0xa0
[11028.292633]  [<ffffffff8141591a>] sock_ioctl+0x7a/0x280
[11028.298576]  [<ffffffff81189012>] vfs_ioctl+0x22/0xa0
[11028.304326]  [<ffffffff81140540>] ? unmap_region+0x110/0x130
[11028.310756]  [<ffffffff811891b4>] do_vfs_ioctl+0x84/0x580
[11028.316897]  [<ffffffff81189731>] sys_ioctl+0x81/0xa0

and

11028.017533]  [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80
[11028.025030]  [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
[11028.031945]  [<ffffffff814eb2ae>] __mutex_lock_slowpath+0x13e/0x180
[11028.039053]  [<ffffffff814eb14b>] mutex_lock+0x2b/0x50
[11028.044910]  [<ffffffffa059f7e7>] __ipoib_ib_dev_flush+0x37/0x210 [ib_ipoib]
[11028.052894]  [<ffffffffa059fa00>] ? ipoib_ib_dev_flush_light+0x0/0x20 [ib_ipoib]
[11028.061363]  [<ffffffffa059fa17>] ipoib_ib_dev_flush_light+0x17/0x20 [ib_ipoib]
[11028.069738]  [<ffffffff8108b120>] worker_thread+0x170/0x2a0
[11028.076068]  [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
[11028.083374]  [<ffffffff8108afb0>] ? worker_thread+0x0/0x2a0
[11028.089709]  [<ffffffff81090626>] kthread+0x96/0xa0
[11028.095266]  [<ffffffff8100c0ca>] child_rip+0xa/0x20
[11028.100921]  [<ffffffff81090590>] ? kthread+0x0/0xa0
[11028.106573]  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
[11028.112423] INFO: task ifconfig:23640 blocked for more than 120 seconds.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Tal Alon
22252b4e09 IPoIB: Change CM skb memory allocation to be non-atomic during init
Change CM skb memory allocation to use GFP_KERNEL when possible.

During device init there's no need to use GFP_ATOMIC when allocating
memory for the CM skbs -- use GFP_KERNEL instead.

Signed-off-by: Tal Alon <talal@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:48 -08:00
Erez Shitrit
c2bb5628db IPoIB: Fix crash in dev_open error flow
If napi has never been enabled when calling ipoib_ib_dev_stop, a
kernel crash occurs, because the verbs layer completion handler
(ipoib_ib_completion) calls napi_schedule unconditionally.

If the napi structure passed in the napi_schedule call has not
been initialized, napi will crash.

The cleanest solution is to simply enable napi before calling
ipoib_ib_dev_stop in the dev_open error flow. (dev_stop then
immediately disables napi).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:48 -08:00
Nicholas Bellinger
4a9a6c8d53 target: Drop left-over se_lun->lun_cmd_list shutdown code
Now with percpu refcounting for se_lun in place, go ahead and drop
the legacy per se_cmd accounting for se_lun shutdown.

This includes __transport_clear_lun_from_sessions(), the associated
transport_lun_wait_for_tasks() logic, along with a handful of now
unused se_cmd structure members and ->transport_state bits.

Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-07 14:25:02 -08:00
Nicholas Bellinger
95b60f0788 ib_isert: Add support for completion interrupt coalescing
This patch adds support for completion interrupt coalescing that
allows only every ISERT_COMP_BATCH_COUNT (8) to set IB_SEND_SIGNALED,
thus avoiding completion interrupts for every posted iser_tx_desc.

The batch processing is done using a per isert_conn llist that once
IB_SEND_SIGNALED has been set is saved to tx_desc->comp_llnode_batch,
and completion processing of previously posted iser_tx_descs is done
in a single shot from within isert_send_completion() code.

Note this is only done for response PDUs from ISCSI_OP_SCSI_CMD, and
all other control type of PDU responses will force an implicit batch
drain to occur.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-06 12:48:22 -08:00
Vu Pham
0a66614b93 iser-target: check device before dereferencing its variable
This patch changes isert_connect_release() to correctly check for
the existence struct isert_device *device before checking for
isert_device->use_frwr.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-23 21:42:33 -07:00
Masanari Iida
8c88126bbb treewide: Fix typo in Kconfig
Correct spelling typo in Kconfig.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-14 15:23:02 +02:00
Jack Wang
c807f64340 ib_srpt: always set response for task management
The SRP specification requires:

  "Response data shall be provided in any SRP_RSP response that is sent in
   response to an SRP_TSK_MGMT request (see 6.7). The information in the
   RSP_CODE field (see table 24) shall indicate the completion status of
   the task management function."

So fix this to avoid the SRP initiator interprets task management functions
that succeeded as failed.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-03 04:23:17 -07:00
Nicholas Bellinger
0b41d6ca61 ib_srpt: Destroy cm_id before destroying QP.
This patch fixes a bug where ib_destroy_cm_id() was incorrectly being called
after srpt_destroy_ch_ib() had destroyed the active QP.

This would result in the following failed SRP_LOGIN_REQ messages:

Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff1762bd, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c903009f8f41)
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff1758f9, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f8f42)
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff175941, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c90300a3cfb2)
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1)
mlx4_core 0000:84:00.0: command 0x19 failed: fw status = 0x9
rejected SRP_LOGIN_REQ because creating a new RDMA channel failed.
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1)
mlx4_core 0000:84:00.0: command 0x19 failed: fw status = 0x9
rejected SRP_LOGIN_REQ because creating a new RDMA channel failed.
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1)

Reported-by: Navin Ahuja <navin.ahuja@saratoga-speed.com>
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-01 21:27:30 -07:00
Linus Torvalds
48efe453e6 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Lots of activity again this round for I/O performance optimizations
  (per-cpu IDA pre-allocation for vhost + iscsi/target), and the
  addition of new fabric independent features to target-core
  (COMPARE_AND_WRITE + EXTENDED_COPY).

  The main highlights include:

   - Support for iscsi-target login multiplexing across individual
     network portals
   - Generic Per-cpu IDA logic (kent + akpm + clameter)
   - Conversion of vhost to use per-cpu IDA pre-allocation for
     descriptors, SGLs and userspace page pointer list
   - Conversion of iscsi-target + iser-target to use per-cpu IDA
     pre-allocation for descriptors
   - Add support for generic COMPARE_AND_WRITE (AtomicTestandSet)
     emulation for virtual backend drivers
   - Add support for generic EXTENDED_COPY (CopyOffload) emulation for
     virtual backend drivers.
   - Add support for fast memory registration mode to iser-target (Vu)

  The patches to add COMPARE_AND_WRITE and EXTENDED_COPY support are of
  particular significance, which make us the first and only open source
  target to support the full set of VAAI primitives.

  Currently Linux clients are lacking upstream support to actually
  utilize these primitives.  However, with server side support now in
  place for folks like MKP + ZAB working on the client, this logic once
  reserved for the highest end of storage arrays, can now be run in VMs
  on their laptops"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
  target/iscsi: Bump versions to v4.1.0
  target: Update copyright ownership/year information to 2013
  iscsi-target: Bump default TCP listen backlog to 256
  target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out
  iscsi-target; Bump default CmdSN Depth to 64
  iscsi-target: Remove unnecessary wait_for_completion in iscsi_get_thread_set
  iscsi-target: Add thread_set->ts_activate_sem + use common deallocate
  iscsi-target: Fix race with thread_pre_handler flush_signals + ISCSI_THREAD_SET_DIE
  target: remove unused including <linux/version.h>
  iser-target: introduce fast memory registration mode (FRWR)
  iser-target: generalize rdma memory registration and cleanup
  iser-target: move rdma wr processing to a shared function
  target: Enable global EXTENDED_COPY setup/release
  target: Add Third Party Copy (3PC) bit in INQUIRY response
  target: Enable EXTENDED_COPY setup in spc_parse_cdb
  target: Add support for EXTENDED_COPY copy offload emulation
  target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check
  target: Add global device list for EXTENDED_COPY
  target: Make helpers non static for EXTENDED_COPY command setup
  target: Make spc_parse_naa_6h_vendor_specific non static
  ...
2013-09-12 16:11:45 -07:00
Nicholas Bellinger
4c76251e8e target: Update copyright ownership/year information to 2013
Update copyright ownership/year information for target-core,
loopback, iscsi-target, tcm_qla2xx, vhost and iser-target.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 20:23:36 -07:00
Vu Pham
59464ef4fb iser-target: introduce fast memory registration mode (FRWR)
This model was introduced in 00f7ec36c "RDMA/core: Add memory
management extensions support" and works when the IB device
supports the IB_DEVICE_MEM_MGT_EXTENSIONS capability.

Upon creating the isert device, ib_isert will test whether the HCA
supports FRWR. If supported then set the flag and assign
function pointers that handle fast registration and deregistration
of appropriate resources (fast_reg descriptors).

When new connection coming in, ib_isert will check frwr flag and
create frwr resouces, if fail to do it will switch back to
old model of using global dma key and turn off the frwr support.

Registration is done using posting IB_WR_FAST_REG_MR to the QP and
invalidations using posting IB_WR_LOCAL_INV.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 16:48:51 -07:00
Vu Pham
d40945d8c2 iser-target: generalize rdma memory registration and cleanup
Current driver uses global dma key to register the memory pointed
by sg list provided by the target core.

This is the preparation step for adding more methods like fast
path memory registration, make the reg/unreg calls be function
pointers.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 16:48:50 -07:00
Vu Pham
90ecc6e251 iser-target: move rdma wr processing to a shared function
isert_put_datain() and isert_get_dataout() share a lot of code
in rdma wr processing, move this common code to a shared function.

Use isert_unmap_cmd to cleanup for RDMA_READ completion.
Remove duplicate field in isert_cmd and isert_rdma_wr structs
Change misc debug messages to track isert_cmd

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 16:48:49 -07:00
Nicholas Bellinger
d703ce2f7f iscsi/iser-target: Convert to command priv_size usage
This command converts iscsi/isert-target to use allocations based on
iscsit_transport->priv_size within iscsit_allocate_cmd(), instead of
using an embedded isert_cmd->iscsi_cmd.

This includes removing iscsit_transport->alloc_cmd() usage, along
with updating isert-target code to use iscsit_priv_cmd().

Also, remove left-over iscsit_transport->release_cmd() usage for
direct calls to iscsit_release_cmd(), and drop the now unused
lio_cmd_cache and isert_cmd_cache.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09 14:29:21 -07:00
Nicholas Bellinger
6faaa85f37 iser-target: Updates for login negotiation multi-plexing support
This patch updates iser-target code to support login negotiation
multi-plexing.  This includes only using isert_conn->conn_login_comp
for the first login request PDU, pushing the subsequent processing
to iscsi_conn->login_work -> iscsi_target_do_login_rx(), and turning
isert_get_login_rx() into a NOP.

v3 changes:
   - Drop unnecessary LOGIN_FLAGS_READ_ACTIVE bit set in
     isert_rx_login_req()

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09 14:27:21 -07:00
Linus Torvalds
7c049d0869 Main batch of InfiniBand/RDMA changes for 3.12 merge window:
- Large ocrdma HW driver update: add "fast register" work requests,
    fixes, cleanups
  - Add receive flow steering support for raw QPs
  - Fix IPoIB neighbour race that leads to crash
  - iSER updates including support for using "fast register" memory
    registration
  - IPv6 support for iWARP
  - XRC transport fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSJ2drAAoJEENa44ZhAt0hpaUQAJ6EdNFh2bon9c/uHz1lw58A
 DIvVniUwWGCRgpbsv/IxZDBVX3G5IKSW6U0Y3/JxMXeIF/cGOUaJZgCeKPiYNoNA
 yxiEGI0ffvpWGAJUHSE2ARJKvgdjrpqGo5UzsiinEhJx3uczeZBcooosQTWDXxya
 /qSWH3rARkic9abuaizkuVzdWEjWLNjyh2bWnqJ4HNZplE8RfSK4bk8QtvEmpn9Z
 dNBKFFujysfHHGflLuSkFAkP1NdqlZeQ4/2uzD23p3YofbJwrJoJNFxkCfx3UUml
 fjPptlhU6kZMCwSXsD24tAV8Exr/CCgmxriFIN/Xqhu4gvBELScRPPqPqFquhtXI
 pP5hbG9/9P5C8BLgABe6IAvlU1lUyraR67bA78nYeKm2JpATE6T23Nx7nUgMgzRS
 Ee3ZvZGZvk8BZ7zyQh8D7Aig1XOtzqA9D5nLUyEJ2aRPSnXJxyi0S3Fbn0vmOMb7
 8CF+99KECG8fb4sjNAjX5Zm48+7WJd+WCd3t89oUzCbJKKpa1Chctph8VH7dEfke
 JkEjOJhuAGqau4bIMZ2nZkISoTfSD7vbjPAxMPASS7fOdR7fe6AqDn9/LMAhWjpw
 4Fb25ZW7rYf9+6jqA4MgMRFCTkXhR25FsqUUL9h8NaWai4F1dfVoZHbRvGIZah5n
 5HQupXGquwES4y6pvHjc
 =rabl
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull main batch of InfiniBand/RDMA changes from Roland Dreier:
 - Large ocrdma HW driver update: add "fast register" work requests,
   fixes, cleanups
 - Add receive flow steering support for raw QPs
 - Fix IPoIB neighbour race that leads to crash
 - iSER updates including support for using "fast register" memory
   registration
 - IPv6 support for iWARP
 - XRC transport fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (54 commits)
  RDMA/ocrdma: Fix compiler warning about int/pointer size mismatch
  IB/iser: Fix redundant pointer check in dealloc flow
  IB/iser: Fix possible memory leak in iser_create_frwr_pool()
  IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards
  RDMA/ocrdma: Fix passing wrong opcode to modify_srq
  RDMA/ocrdma: Fill PVID in UMC case
  RDMA/ocrdma: Add ABI versioning support
  RDMA/ocrdma: Consider multiple SGES in case of DPP
  RDMA/ocrdma: Fix for displaying proper link speed
  RDMA/ocrdma: Increase STAG array size
  RDMA/ocrdma: Dont use PD 0 for userpace CQ DB
  RDMA/ocrdma: FRMA code cleanup
  RDMA/ocrdma: For ERX2 irrespective of Qid, num_posted offset is 24
  RDMA/ocrdma: Fix to work with even a single MSI-X vector
  RDMA/ocrdma: Remove the MTU check based on Ethernet MTU
  RDMA/ocrdma: Add support for fast register work requests (FRWR)
  RDMA/ocrdma: Create IRD queue fix
  IB/core: Better checking of userspace values for receive flow steering
  IB/mlx4: Add receive flow steering support
  IB/core: Export ib_create/destroy_flow through uverbs
  ...
2013-09-05 09:39:27 -07:00
Roland Dreier
82af24ac6f Merge branches 'cxgb4', 'flowsteer', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next 2013-09-03 09:01:08 -07:00
Sagi Grimberg
2e02d653fe IB/iser: Fix redundant pointer check in dealloc flow
This bug was discovered by Smatch static checker run by Dan Carpenter.
If in free_rx_descriptors(), rx_descs are not NULL then the iser
device is definately not NULL, so no need to check it before
dereferencing it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02 21:26:16 -07:00
Roi Dayan
27ae2d1ea5 IB/iser: Fix possible memory leak in iser_create_frwr_pool()
Fix leak where desc is not being freed in error flows.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02 21:24:08 -07:00
Or Gerlitz
6a06a4b8cf [SCSI] IB/iser: Add Discovery support
To run discovery over iSER we need to advertize the CAP_TEXT_NEGO capability
towards user space. Also need to make sure the login RX buffer is posted when
SendTargets TEXT PDUs are sent. For that end, we use a setting of the
ISCSI_PARAM_DISCOVERY_SESS iscsi param as an indication that this is
discovery session.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-26 18:53:49 +04:00
Jim Foraker
49b8e74438 IPoIB: Fix race in deleting ipoib_neigh entries
In several places, this snippet is used when removing neigh entries:

	list_del(&neigh->list);
	ipoib_neigh_free(neigh);

The list_del() removes neigh from the associated struct ipoib_path, while
ipoib_neigh_free() removes neigh from the device's neigh entry lookup
table.  Both of these operations are protected by the priv->lock
spinlock.  The table however is also protected via RCU, and so naturally
the lock is not held when doing reads.

This leads to a race condition, in which a thread may successfully look
up a neigh entry that has already been deleted from neigh->list.  Since
the previous deletion will have marked the entry with poison, a second
list_del() on the object will cause a panic:

  #5 [ffff8802338c3c70] general_protection at ffffffff815108c5
     [exception RIP: list_del+16]
     RIP: ffffffff81289020  RSP: ffff8802338c3d20  RFLAGS: 00010082
     RAX: dead000000200200  RBX: ffff880433e60c88  RCX: 0000000000009e6c
     RDX: 0000000000000246  RSI: ffff8806012ca298  RDI: ffff880433e60c88
     RBP: ffff8802338c3d30   R8: ffff8806012ca2e8   R9: 00000000ffffffff
     R10: 0000000000000001  R11: 0000000000000000  R12: ffff8804346b2020
     R13: ffff88032a3e7540  R14: ffff8804346b26e0  R15: 0000000000000246
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib]
  #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm]
  #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm]
  #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10
 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66
 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca

We move the list_del() into ipoib_neigh_free(), so that deletion happens
only once, after the entry has been successfully removed from the lookup
table.  This same behavior is already used in ipoib_del_neighs_by_gid()
and __ipoib_reap_neigh().

Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-13 11:57:37 -07:00
Sagi Grimberg
5587856c96 IB/iser: Introduce fast memory registration model (FRWR)
Newer HCAs and Virtual functions may not support FMRs but rather a fast
registration model, which we call FRWR - "Fast Registration Work Requests".

This model was introduced in 00f7ec36c ("RDMA/core: Add memory management
extensions support") and works when the IB device supports the
IB_DEVICE_MEM_MGT_EXTENSIONS capability.

Upon creating the iser device iser will test whether the HCA supports
FMRs.  If no support for FMRs, check if IB_DEVICE_MEM_MGT_EXTENSIONS
is supported and assign function pointers that handle fast
registration and allocation of appropriate resources (fast_reg
descriptors).

Registration is done using posting IB_WR_FAST_REG_MR to the QP and
invalidations using posting IB_WR_LOCAL_INV.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:10 -07:00
Sagi Grimberg
e657571b76 IB/iser: Place the fmr pool into a union in iser's IB conn struct
This is preparation step for other memory registration methods to be
added.  In addition, change reg/unreg routines signature to indicate
they use FMRs.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:09 -07:00
Sagi Grimberg
919fc274ef IB/iser: Handle unaligned SG in separate function
This routine will be shared with other rdma management schemes.  The
bounce buffer solution for unaligned SG-lists and the sg_to_page_vec
routine are likely to be used for other registration schemes and not
just FMR.

Move them out of the FMR specific code, and call them from there.
Later they will be called also from other reg methods code.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:09 -07:00
Sagi Grimberg
b4e155ffbb IB/iser: Generalize rdma memory registration
Currently the driver uses FMRs as the only means to register the
memory pointed by SG provided by the SCSI mid-layer with the RDMA
device.

As preparation step for adding more methods for fast path memory
registration, make the alloc/free and reg/unreg calls function
pointers, which are for now just set to the existing FMR ones.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:09 -07:00
Shlomo Pongratz
b7f0451309 IB/iser: Accept session->cmds_max from user space
Use cmds_max passed from user space to be the number of PDUs to be
supported for the session instead of hard-coded ISCSI_DEF_XMIT_CMDS_MAX.
This allow controlling the max number of SCSI commands for the session.
Also don't ignore the qdepth passed from user space.

Derive from session->cmds_max the actual number of RX buffers and FMR
pool size to allocate during the connection bind phase.

Since the iser transport connection is established before the iscsi
session/connection are created and bound, we still use one hard-coded
quantity ISER_DEF_XMIT_CMDS_MAX to compute the maximum number of
work-requests to be supported by the RC QP used for the connection.

The above quantity is made to be a power of two between ISCSI_TOTAL_CMDS_MIN
(16) and ISER_DEF_XMIT_CMDS_MAX (512) inclusive.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:08 -07:00
Shlomo Pongratz
986db0d6c0 IB/iser: Restructure allocation/deallocation of connection resources
This is a preparation step to a patch that accepts the number of max
SCSI commands to be supported a session from user space iSCSI tools.

Move the allocation of the login buffer, FMR pool and its associated
page vector from iser_create_ib_conn_res() (which is called prior when
we actually know how many commands should be supported) to
iser_alloc_rx_descriptors() (which is called during the iscsi
connection bind step where this quantity is known).

Also do small refactoring around the deallocation to make that path
similar to the allocation one.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:08 -07:00
Or Gerlitz
f91424cf5b IB/iser: Use proper debug level value for info prints
Commit 4f36388261 ("IB/iser: Move informational messages from error
to info level") set info prints to be emitted at a lower debug level
than warning prints, which is a bit odd.  Fix that.

Also move the prints on unaligned SG from warning to debug level.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:08 -07:00
Erez Shitrit
c290414169 IPoIB: Fix pkey change flow for virtualization environments
IPoIB's required behaviour w.r.t to the pkey used by the device is the following:

- For "parent" interfaces (e.g ib0, ib1, etc) who are created
  automatically as a result of hot-plug events from the IB core, the
  driver needs to take whatever pkey vlaue it finds in index 0, and
  stick to that index.

- For child interfaces (e.g ib0.8001, etc) created by admin directive,
  the driver needs to use and stick to the value provided during its
  creation.

In SR-IOV environment its possible for the VF probe to take place
before the cloud management software provisions the suitable pkey for
the VF in the paravirtualed PKEY table index 0. When this is the case,
the VF IB stack will find in index 0 an invalide pkey, which is all
zeros.

Moreover, the cloud managment can assign the pkey value at index 0 at
any time of the guest life cycle.

The correct behavior for IPoIB to address these requirements for
parent interfaces is to use PKEY_CHANGE event as trigger to optionally
re-init the device pkey value and re-create all the relevant resources
accordingly, if the value of the pkey in index 0 has changed (from
invalid to valid or from valid value X to invalid value Y).

This patch enhances the heavy flushing code which is triggered by pkey
change event, to behave correctly for parent devices. For child
devices, the code remains the same, namely chases pkey value and not
index.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31 14:23:44 -07:00
Or Gerlitz
3d790a4c26 IPoIB: Make sure child devices use valid/proper pkeys
Make sure that the IB invalid pkey (0x0000 or 0x8000) isn't used for
child devices.

Also, make sure to always set the full membership bit for the pkey of
devices created by rtnl link ops.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31 14:23:40 -07:00
Linus Torvalds
c552441373 Main batch of InfiniBand/RDMA changes for 3.11 merge window:
- AF_IB (native IB addressing) for CMA from Sean Hefty
  - New mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes)
  - SRP fixes from Bart Van Assche (including fix to first merge request)
  - qib HW driver updates
  - Resurrection of ocrdma HW driver development
  - uverbs conversion to create fds with O_CLOEXEC set
  - Other small changes and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJR30TKAAoJEENa44ZhAt0h854P/jvAhK5u+XTM5VyjAi0DKJ7P
 bWcsu+KxbOIFnjEdsYQl1mGP44gdO8GPZp7+JR5nDHDRpw9K76qy6QQiPbaF6Y8D
 cZH8Xlq4hzBfElTWBkExEemPrVUUq77j03FE9TBatdLAtEyYkgrNyqr7Ys6zVwVK
 ugR8nAahvnB7Jh1tsyZBBd9kfbWtXJnaGC8/Zk3Na4n4zXRAbr0DcnRF0sncTL38
 VFnWbi33OQAxu5bsb2jGec/SNP3BbNwspFPjSCKqiiItRaCj13JiHhrKKvVk4RZe
 hIRnPH47kjLRp2/PwBo6o+gTXZuRg48VGBx4CKUTwx1nCzPPN1iz9ZOfqUv9Qwcv
 LX8mxC7QS/Yvud4KeEBsj6kotb80EkRF2KV5RkIKCxQiwetGD9127bZylC8ttxGw
 2f6MzYtAGD4R4C10lO8N+59VugSg1xAvwsqz0a/jy2XyVHbI1ugQedzkB20x5WPY
 51S08ABvtU9yIxIYrw2VEaa/5WN+XJ6+LpG9QBAGXdMLiCiiAe7n/YzyXI6AgwaW
 Jl/uKr6H6/jEHUHKwkyqsmbpVGPhtGWu8deyr1oYvOEP4i48gcDqMQsfMcCISrQV
 MeQU3hS/obykUlNeqjmMI2CXrecqSsiq0hXd4DLaSoZ2Rb4Drx2Wj6sTQLIAgL2q
 GBYjHWMUpZXIFHQaH7am
 =nZh8
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - AF_IB (native IB addressing) for CMA from Sean Hefty
 - new mlx5 driver for Mellanox Connect-IB adapters (including post
   merge request fixes)
 - SRP fixes from Bart Van Assche (including fix to first merge request)
 - qib HW driver updates
 - resurrection of ocrdma HW driver development
 - uverbs conversion to create fds with O_CLOEXEC set
 - other small changes and fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
  mlx5: Return -EFAULT instead of -EPERM
  IB/qib: Log all SDMA errors unconditionally
  IB/qib: Fix module-level leak
  mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec
  IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
  IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
  mlx5_core: Fixes for sparse warnings
  IB/mlx5: Make profile[] static in main.c
  mlx5: Fix parameter type of health_handler_t
  mlx5: Add driver for Mellanox Connect-IB adapters
  IB/core: Add reserved values to enums for low-level driver use
  IB/srp: Bump driver version and release date
  IB/srp: Make HCA completion vector configurable
  IB/srp: Maintain a single connection per I_T nexus
  IB/srp: Fail I/O fast if target offline
  IB/srp: Skip host settle delay
  IB/srp: Avoid skipping srp_reset_host() after a transport error
  IB/srp: Fix remove_one crash due to resource exhaustion
  IB/qib: New transmitter tunning settings for Dell 1.1 backplane
  IB/core: Fix error return code in add_port()
  ...
2013-07-13 12:57:21 -07:00
Bart Van Assche
80d5e8a235 IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
If the transport layer is offline it is more appropriate to let
srp_abort() return FAST_IO_FAIL instead of SUCCESS.

Reported-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-11 16:43:48 -07:00
Linus Torvalds
6d2fa9e141 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Lots of activity this round on performance improvements in target-core
  while benchmarking the prototype scsi-mq initiator code with
  vhost-scsi fabric ports, along with a number of iscsi/iser-target
  improvements and hardening fixes for exception path cases post v3.10
  merge.

  The highlights include:

   - Make persistent reservations APTPL buffer allocated on-demand, and
     drop per t10_reservation buffer.  (grover)
   - Make virtual LUN=0 a NULLIO device, and skip allocation of NULLIO
     device pages (grover)
   - Add transport_cmd_check_stop write_pending bit to avoid extra
     access of ->t_state_lock is WRITE I/O submission fast-path.  (nab)
   - Drop unnecessary CMD_T_DEV_ACTIVE check from
     transport_lun_remove_cmd to avoid extra access of ->t_state_lock in
     release fast-path.  (nab)
   - Avoid extra t_state_lock access in __target_execute_cmd fast-path
     (nab)
   - Drop unnecessary vhost-scsi wait_for_tasks=true usage +
     ->t_state_lock access in release fast-path.  (nab)
   - Convert vhost-scsi to use modern se_cmd->cmd_kref
     TARGET_SCF_ACK_KREF usage (nab)
   - Add tracepoints for SCSI commands being processed (roland)
   - Refactoring of iscsi-target handling of ISCSI_OP_NOOP +
     ISCSI_OP_TEXT to be transport independent (nab)
   - Add iscsi-target SendTargets=$IQN support for in-band discovery
     (nab)
   - Add iser-target support for in-band discovery (nab + Or)
   - Add iscsi-target demo-mode TPG authentication context support (nab)
   - Fix isert_put_reject payload buffer post (nab)
   - Fix iscsit_add_reject* usage for iser (nab)
   - Fix iscsit_sequence_cmd reject handling for iser (nab)
   - Fix ISCSI_OP_SCSI_TMFUNC handling for iser (nab)
   - Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED (nab)

  The last five iscsi/iser-target items are CC'ed to stable, as they do
  address issues present in v3.10 code.  They are certainly larger than
  I'd like for stable patch set, but are important to ensure proper
  REJECT exception handling in iser-target for 3.10.y"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
  iser-target: Ignore non TEXT + LOGOUT opcodes for discovery
  target: make queue_tm_rsp() return void
  target: remove unused codes from enum tcm_tmrsp_table
  iscsi-target: kstrtou* configfs attribute parameter cleanups
  iscsi-target: Fix tfc_tpg_auth_cit configfs length overflow
  iscsi-target: Fix tfc_tpg_nacl_auth_cit configfs length overflow
  iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling
  iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len]
  iser-target: Add vendor_err debug output
  target: Add (obsolete) checking for PMI/LBA fields in READ CAPACITY(10)
  target: Return correct sense data for IO past the end of a device
  target: Add tracepoints for SCSI commands being processed
  iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED
  iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser
  iscsi-target: Fix iscsit_sequence_cmd reject handling for iser
  iscsi-target: Fix iscsit_add_reject* usage for iser
  iser-target: Fix isert_put_reject payload buffer post
  iscsi-target: missing kfree() on error path
  iscsi-target: Drop left-over iscsi_conn->bad_hdr
  target: Make core_scsi3_update_and_write_aptpl return sense_reason_t
  ...
2013-07-11 12:57:19 -07:00
Nicholas Bellinger
ca40d24eb8 iser-target: Ignore non TEXT + LOGOUT opcodes for discovery
This patch adds a check in isert_rx_opcode() to ignore non TEXT + LOGOUT
opcodes when SessionType=Discovery has been negotiated.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:37:02 -07:00
Joern Engel
b79fafac70 target: make queue_tm_rsp() return void
The return value wasn't checked by any of the callers.  Assuming this is
correct behaviour, we can simplify some code by not bothering to
generate it.

nab: Add srpt_queue_data_in() + srpt_queue_tm_rsp() nops around
     srpt_queue_response() void return

Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:53 -07:00
Nicholas Bellinger
adb54c2931 iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling
This patch adds isert_handle_text_cmd() to handle incoming
ISCSI_OP_TEXT PDU processing, along with isert_put_text_rsp()
for posting ISCSI_OP_TEXT_RSP ib_send_wr response.

It copies ISCSI_OP_TEXT payload using unsolicited payload at
&iser_rx_desc->data[0] into iscsi_cmd->text_in_ptr for usage
with outgoing isert_put_text_rsp() -> iscsit_build_text_rsp()

v2 changes:
  - Let iscsit_build_text_rsp() determine any extra padding

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:48 -07:00
Nicholas Bellinger
dbbc5d1107 iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len]
Now that these two variables are used for REJECT payloads as well
as SCSI response sense payloads, rename them to something that
makes more sense.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:47 -07:00
Nicholas Bellinger
c5a2adbfcb iser-target: Add vendor_err debug output
Add output for ib_wc.vendor_err in isert_cq_[t,r]x_work(), which
is useful for debugging future issues.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:46 -07:00
Nicholas Bellinger
b2cb96494d iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED
This patch addresses a bug where RDMA_CM_EVENT_DISCONNECTED may occur
before the connection shutdown has been completed by rx/tx threads,
that causes isert_free_conn() to wait indefinately on ->conn_wait.

This patch allows isert_disconnect_work code to invoke rdma_disconnect
when isert_disconnect_work() process context is started by client
session reset before isert_free_conn() code has been reached.

It also adds isert_conn->conn_mutex protection for ->state within
isert_disconnect_work(), isert_cq_comp_err() and isert_free_conn()
code, along with isert_check_state() for wait_event usage.

(v2: Add explicit iscsit_cause_connection_reinstatement call
     during isert_disconnect_work() to force conn reset)

Cc: stable@vger.kernel.org  # 3.10+
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:35:56 -07:00
Nicholas Bellinger
186a964701 iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser
This patch adds target_get_sess_cmd reference counting for
iscsit_handle_task_mgt_cmd(), and adds a target_put_sess_cmd()
for the failure case.

It also fixes a bug where ISCSI_OP_SCSI_TMFUNC type commands
where leaking iscsi_cmd->i_conn_node and eventually triggering
an OOPs during struct isert_conn shutdown.

Cc: stable@vger.kernel.org  # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-06 22:01:23 -07:00
Nicholas Bellinger
561bf15892 iscsi-target: Fix iscsit_sequence_cmd reject handling for iser
This patch moves ISCSI_OP_REJECT failures into iscsit_sequence_cmd()
in order to avoid external iscsit_reject_cmd() reject usage for all
PDU types.

It also updates PDU specific handlers for traditional iscsi-target
code to not reset the session after posting a ISCSI_OP_REJECT during
setup.

(v2: Fix CMDSN_LOWER_THAN_EXP for ISCSI_OP_SCSI to call
     target_put_sess_cmd() after iscsit_sequence_cmd() failure)

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@vger.kernel.org  # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-06 21:59:54 -07:00
Nicholas Bellinger
ba15991408 iscsi-target: Fix iscsit_add_reject* usage for iser
This patch changes iscsit_add_reject() + iscsit_add_reject_from_cmd()
usage to not sleep on iscsi_cmd->reject_comp to address a free-after-use
usage bug in v3.10 with iser-target code.

It saves ->reject_reason for use within iscsit_build_reject() so the
correct value for both transport cases.  It also drops the legacy
fail_conn parameter usage throughput iscsi-target code and adds
two iscsit_add_reject_cmd() and iscsit_reject_cmd helper functions,
along with various small cleanups.

(v2: Re-enable target_put_sess_cmd() to be called from
     iscsit_add_reject_from_cmd() for rejects invoked after
     target_get_sess_cmd() has been called)

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@vger.kernel.org  # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-06 21:59:53 -07:00
Nicholas Bellinger
3df8f68aaf iser-target: Fix isert_put_reject payload buffer post
This patch adds the missing isert_put_reject() logic to post
a outgoing payload buffer to hold the 48 bytes of original PDU
header request payload for the rejected cmd.

It also fixes ISTATE_SEND_REJECT handling in isert_response_completion()
-> isert_do_control_comp() code, and drops incorrect iscsi_cmd_t->reject_comp
usage.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@vger.kernel.org  # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-06 21:59:35 -07:00
Linus Torvalds
80cc38b163 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  treewide: relase -> release
  Documentation/cgroups/memory.txt: fix stat file documentation
  sysctl/net.txt: delete reference to obsolete 2.4.x kernel
  spinlock_api_smp.h: fix preprocessor comments
  treewide: Fix typo in printk
  doc: device tree: clarify stuff in usage-model.txt.
  open firmware: "/aliasas" -> "/aliases"
  md: bcache: Fixed a typo with the word 'arithmetic'
  irq/generic-chip: fix a few kernel-doc entries
  frv: Convert use of typedef ctl_table to struct ctl_table
  sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
  doc: clk: Fix incorrect wording
  Documentation/arm/IXP4xx fix a typo
  Documentation/networking/ieee802154 fix a typo
  Documentation/DocBook/media/v4l fix a typo
  Documentation/video4linux/si476x.txt fix a typo
  Documentation/virtual/kvm/api.txt fix a typo
  Documentation/early-userspace/README fix a typo
  Documentation/video4linux/soc-camera.txt fix a typo
  lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
  ...
2013-07-04 11:40:58 -07:00
Vu Pham
e8ca413558 IB/srp: Bump driver version and release date
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-01 10:42:02 -07:00
Bart Van Assche
4b5e5f41c8 IB/srp: Make HCA completion vector configurable
Several InfiniBand HCAs allow configuring the completion vector per
CQ.  This allows spreading the workload created by IB completion
interrupts over multiple MSI-X vectors and hence over multiple CPU
cores.  In other words, configuring the completion vector properly not
only allows reducing latency on an initiator connected to multiple
SRP targets but also allows improving throughput.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-01 10:40:55 -07:00
Bart Van Assche
96fc248a4c IB/srp: Maintain a single connection per I_T nexus
An SRP target is required to maintain a single connection between
initiator and target.  This means that if the 'add_target' attribute
is used to create a second connection to a target, the first
connection will be logged out and that the SCSI error handler will
kick in.  The SCSI error handler will cause the SRP initiator to
reconnect, which will cause I/O over the second connection to fail.
Avoid such ping-pong behavior by disabling relogins.

If reconnecting manually is necessary, that is possible by deleting
and recreating an rport via sysfs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-01 10:40:07 -07:00
Bart Van Assche
99e1c1398f IB/srp: Fail I/O fast if target offline
If reconnecting failed we know that no command completion will
be received anymore.  Hence let the SCSI error handler fail such
commands immediately.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-01 10:37:13 -07:00
Bart Van Assche
2742c1dadd IB/srp: Skip host settle delay
The SRP initiator implements host reset by reconnecting to the SRP
target.  That means that communication with the target is possible as
soon as host reset finished. Hence skip the host settle delay.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-27 16:44:39 -07:00
Bart Van Assche
086f44f588 IB/srp: Avoid skipping srp_reset_host() after a transport error
The SCSI error handler assumes that the transport layer is operational
if an eh_abort_handler() returns SUCCESS.  Hence srp_abort() only
should return SUCCESS if sending the ABORT TASK task management
function succeeded.  This patch avoids the SCSI error handler skipping
the srp_reset_host() call after a transport layer error.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-27 16:44:39 -07:00
Dotan Barak
1fe0cb8488 IB/srp: Fix remove_one crash due to resource exhaustion
If the add_one callback fails during driver load no resources are
allocated so there isn't a need to release any resources. Trying
to clean the resource may lead to the following kernel panic:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp]
    RIP: 0010:[<ffffffffa0132331>]  [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp]
    Process rmmod (pid: 4562, threadinfo ffff8800dd738000, task ffff8801167e60c0)
    Call Trace:
     [<ffffffffa024500e>] ib_unregister_client+0x4e/0x120 [ib_core]
     [<ffffffffa01361bd>] srp_cleanup_module+0x15/0x71 [ib_srp]
     [<ffffffff810ac6a4>] sys_delete_module+0x194/0x260
     [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-27 16:44:38 -07:00
Nicholas Bellinger
778de36896 iscsi/isert-target: Refactor ISCSI_OP_NOOP RX handling
This patch refactors ISCSI_OP_NOOP handling within iscsi-target in
order to handle iscsi_nopout payloads in a transport specific manner.

This includes splitting existing iscsit_handle_nop_out() into
iscsit_setup_nop_out() and iscsit_process_nop_out() calls, and
makes iscsit_handle_nop_out() be only used internally by traditional
iscsi socket calls.

Next update iser-target code to use new callers and add FIXME for
the handling iscsi_nopout payloads.  Also fix reject response handling
in iscsit_setup_nop_out() to use proper iscsit_add_reject_from_cmd().

v2: Fix uninitialized iscsit_handle_nop_out() payload_length usage (Fengguang)
v3: Remove left-over dead code in iscsit_setup_nop_out() (DanC)

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-06-24 22:35:51 -07:00
Or Gerlitz
28f292e879 IB/iser: Add Mellanox copyright
Add Mellanox copyright to the iser initiator source code which I maintain.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-04 17:03:12 -07:00
Roi Dayan
5b61ff43a7 IB/iser: Fix device removal flow
Change the code to destroy the "last opened" rdma_cm id after making
sure we released all other objects (QP, CQs, PD, etc) associated with
the IB device.

Since iser accesses the IB device using the rdma_cm id, we need to
free any objects that are related to the device that is associated
with the rdma_cm id prior to destroying that id.  When this isn't
done, the low level driver that created this device can be unloaded
before iser has a chance to free all the objects and a such a call may
invoke code segment which isn't valid any more and crash.

Cc: Sean Hefty <sean.hefty@intel.com
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-04 17:03:11 -07:00
Nicholas Bellinger
1d19f7800d ib_srpt: Call target_sess_cmd_list_set_waiting during shutdown_session
Given that srpt_release_channel_work() calls target_wait_for_sess_cmds()
to allow outstanding se_cmd_t->cmd_kref a change to complete, the call
to perform target_sess_cmd_list_set_waiting() needs to happen in
srpt_shutdown_session()

Also, this patch adds an explicit call to srpt_shutdown_session() within
srpt_drain_channel() so that target_sess_cmd_list_set_waiting() will be
called in the cases where TFO->shutdown_session() is not triggered
directly by TCM.

Cc: Joern Engel <joern@logfs.org>
Cc: Roland Dreier <roland@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-29 21:30:46 -07:00
Masanari Iida
8b513d0cf6 treewide: Fix typo in printk
Correct spelling typo in various part of drivers

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-28 12:02:13 +02:00
Joern Engel
be646c2d2b target: Remove unused wait_for_tasks bit in target_wait_for_sess_cmds
Drop unused transport_wait_for_tasks() check in target_wait_for_sess_cmds
shutdown code, and convert tcm_qla2xxx + ib_srpt fabric drivers.

Cc: Joern Engel <joern@logfs.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-20 21:44:10 -07:00
Linus Torvalds
e0fd9affeb InfiniBand/RDMA changes for the 3.10 merge window:
- XRC transport fixes
  - Fix DHCP on IPoIB
  - mlx4 preparations for flow steering
  - iSER fixes
  - miscellaneous other fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJRisChAAoJEENa44ZhAt0hHLoP/iX5MxtHJ3X1u5KcARX/7nci
 CCH/VnD172re/KavCCg7zkbZQpS4jHCtW/CzLUCSPqBGOaj78HFTUkB3ragvUW7m
 ndERwplF8DP/i0x7Kk7Wau2a4RdlH0lqwucOjqTyQnbIdxknkz6w3Jcsb9Ic2lzx
 up0T0HaHtTxdVF6lXOB5QOIpUGg3l0Yu4euX2BA61WDoZj+VIYhgeWZewq0iV0D1
 rLtarJ+Or7mdwu2rNcDHgrD0lhF4SCBd3rx4lbc4F68Cr8JUz0Xe7liPLNskeLhW
 f3NEm3gmkYp9YI1otGsA0X/CyV6wnRk4mT8JMlOb2WNzeq2V13Z54/9ZyF5/gFD2
 JgzkQB9Ibf7EmTwXWd6+0+FA40Q6dNvnRnhddRM255dvDVw7nxUr2UzYH4Re/Z9K
 rNFjkvix2YUwEmoPjitWocz2kj2reDMqjtiVDmdGy1YbtnicH5GtkQsWkoPg8ON9
 m5jORUdzydTD+yBJwTiFP1EuFoG3TdfoZ7zHMJwWy/u8i308xD6WPGms9MTdjh8j
 7gjz2TCKr+vpuVRh/p6esCPPOTSsSeWDeowy7Sgpdf3qoqAImXsWXVrl2kXLhtyl
 1VIgHU3ztm7oqwmy0gQ/zVCo4CLLdsif2zmEIDpxJPnWaSq+D9LJdyLTcfdBjhQ8
 9SjUafe4msT1pIjNb7ND
 =1ojD
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - XRC transport fixes
 - Fix DHCP on IPoIB
 - mlx4 preparations for flow steering
 - iSER fixes
 - miscellaneous other fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (23 commits)
  IB/iser: Add support for iser CM REQ additional info
  IB/iser: Return error to upper layers on EAGAIN registration failures
  IB/iser: Move informational messages from error to info level
  IB/iser: Add module version
  mlx4_core: Expose a few helpers to fill DMFS HW strucutures
  mlx4_core: Directly expose fields of DMFS HW rule control segment
  mlx4_core: Change a few DMFS fields names to match firmare spec
  mlx4: Match DMFS promiscuous field names to firmware spec
  mlx4_core: Move DMFS HW structs to common header file
  IB/mlx4: Set link type for RAW PACKET QPs in the QP context
  IB/mlx4: Disable VLAN stripping for RAW PACKET QPs
  mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level
  RDMA/iwcm: Don't touch cmid after dropping reference
  IB/qib: Correct qib_verbs_register_sysfs() error handling
  IB/ipath: Correct ipath_verbs_register_sysfs() error handling
  RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled
  SRPT: Fix odd use of WARN_ON()
  IPoIB: Fix ipoib_hard_header() return value
  RDMA: Rename random32() to prandom_u32()
  RDMA/cxgb3: Fix uninitialized variable
  ...
2013-05-08 15:29:48 -07:00
Roland Dreier
ea9627c800 Merge branches 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-next 2013-05-08 14:12:37 -07:00
Andrew Morton
50bea5c0d5 drivers/infiniband/hw: rename random32() to prandom_u32()
Use preferable function name which implies using a pseudo-random number
generator.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 18:38:27 -07:00
Or Gerlitz
8d8399deb0 IB/iser: Add support for iser CM REQ additional info
Annex A12 of the IBTA spec defines additional information that needs
to be provided through the CM exchange relating to usage of ZBVA (Zero
Based VAs) and Send With Invalidate over an iSER connection.

Currently, the initiator sets both to not supported, but does provide
the header so that existing iSER targets can be patched to start
looking on the private data carried by the CM.

This is a preparation step to enable iSER with HW drivers for which
FMRs are not supported, such as mlx4 VF instances or new HW devices
which might support only FRWR (Fast Registration Work-Requests) along
the details of the IB_DEVICE_MEM_MGT_EXTENSIONS device capability.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-01 17:34:14 -07:00
Or Gerlitz
450d1e40d5 IB/iser: Return error to upper layers on EAGAIN registration failures
Commit 819a087316 ("IB/iser: Avoid error prints on EAGAIN
registration failures") not only eliminated the error print on that
case, but rather also modified the code such that it doesn't return
any error to upper layers.  As a result a wrong mapping was used.  Fix
this to correctly return the error in that case.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-01 17:34:13 -07:00
Roi Dayan
4f36388261 IB/iser: Move informational messages from error to info level
Introduce iser_info() and move informational messages that were
printed as errors to use that macro. Also, cleanup printk leftovers to
use the existing macros.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

[ Use pr_warn(... instead of printk(KERN_WARNING ....  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-01 17:34:13 -07:00
Roi Dayan
c1d786e682 IB/iser: Add module version
Add displaying module version, update the version to 1.1,
and remove the DRV_DATE define.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-01 17:34:13 -07:00
Linus Torvalds
73287a43cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights (1721 non-merge commits, this has to be a record of some
  sort):

   1) Add 'random' mode to team driver, from Jiri Pirko and Eric
      Dumazet.

   2) Make it so that any driver that supports configuration of multiple
      MAC addresses can provide the forwarding database add and del
      calls by providing a default implementation and hooking that up if
      the driver doesn't have an explicit set of handlers.  From Vlad
      Yasevich.

   3) Support GSO segmentation over tunnels and other encapsulating
      devices such as VXLAN, from Pravin B Shelar.

   4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

   5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
      Dukkipati.

   6) In the PHY layer, allow supporting wake-on-lan in situations where
      the PHY registers have to be written for it to be configured.

      Use it to support wake-on-lan in mv643xx_eth.

      From Michael Stapelberg.

   7) Significantly improve firewire IPV6 support, from YOSHIFUJI
      Hideaki.

   8) Allow multiple packets to be sent in a single transmission using
      network coding in batman-adv, from Martin Hundebøll.

   9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

  10) Generalize the VXLAN forwarding tables so that there is more
      flexibility in configurating various aspects of the endpoints.
      From David Stevens.

  11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
      from Dmitry Kravkov.

  12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
      Neira Ayuso.

  13) Start adding networking selftests.

  14) In situations of overload on the same AF_PACKET fanout socket, or
      per-cpu packet receive queue, minimize drop by distributing the
      load to other cpus/fanouts.  From Willem de Bruijn and Eric
      Dumazet.

  15) Add support for new payload offset BPF instruction, from Daniel
      Borkmann.

  16) Convert several drivers over to mdoule_platform_driver(), from
      Sachin Kamat.

  17) Provide a minimal BPF JIT image disassembler userspace tool, from
      Daniel Borkmann.

  18) Rewrite F-RTO implementation in TCP to match the final
      specification of it in RFC4138 and RFC5682.  From Yuchung Cheng.

  19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
      you like netlink, so I implemented netlink dumping of netlink
      sockets.") From Andrey Vagin.

  20) Remove ugly passing of rtnetlink attributes into rtnl_doit
      functions, from Thomas Graf.

  21) Allow userspace to be able to see if a configuration change occurs
      in the middle of an address or device list dump, from Nicolas
      Dichtel.

  22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
      Frederic Sowa.

  23) Increase accuracy of packet length used by packet scheduler, from
      Jason Wang.

  24) Beginning set of changes to make ipv4/ipv6 fragment handling more
      scalable and less susceptible to overload and locking contention,
      from Jesper Dangaard Brouer.

  25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
      instead.  From Hong Zhiguo.

  26) Optimize route usage in IPVS by avoiding reference counting where
      possible, from Julian Anastasov.

  27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

  28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
      Eitzenberger.

  29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
      nfnetlink_log, and nfnetlink_queue.  From Gao feng.

  30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

  31) Support several new r8169 chips, from Hayes Wang.

  32) Support tokenized interface identifiers in ipv6, from Daniel
      Borkmann.

  33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

  34) Add 802.1ad vlan offload support, from Patrick McHardy.

  35) Support mmap() based netlink communication, also from Patrick
      McHardy.

  36) Support HW timestamping in mlx4 driver, from Amir Vadai.

  37) Rationalize AF_PACKET packet timestamping when transmitting, from
      Willem de Bruijn and Daniel Borkmann.

  38) Bring parity to what's provided by /proc/net/packet socket dumping
      and the info provided by netlink socket dumping of AF_PACKET
      sockets.  From Nicolas Dichtel.

  39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
      Poirier"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  filter: fix va_list build error
  af_unix: fix a fatal race with bit fields
  bnx2x: Prevent memory leak when cnic is absent
  bnx2x: correct reading of speed capabilities
  net: sctp: attribute printl with __printf for gcc fmt checks
  netlink: kconfig: move mmap i/o into netlink kconfig
  netpoll: convert mutex into a semaphore
  netlink: Fix skb ref counting.
  net_sched: act_ipt forward compat with xtables
  mlx4_en: fix a build error on 32bit arches
  Revert "bnx2x: allow nvram test to run when device is down"
  bridge: avoid OOPS if root port not found
  drivers: net: cpsw: fix kernel warn on cpsw irq enable
  sh_eth: use random MAC address if no valid one supplied
  3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
  tg3: fix to append hardware time stamping flags
  unix/stream: fix peeking with an offset larger than data in queue
  unix/dgram: fix peeking with an offset larger than data in queue
  unix/dgram: peek beyond 0-sized skbs
  openvswitch: Remove unneeded ovs_netdev_get_ifindex()
  ...
2013-05-01 14:08:52 -07:00
Linus Torvalds
6da6dc2380 Merge branch 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target update from Nicholas Bellinger:
 "The highlights this round include:

   - Add fileio support for WRITE_SAME w/ UNMAP=1 discard (asias)
   - Add fileio support for UNMAP discard (asias)
   - Add tcm_vhost hotplug support to work with upstream QEMU
     vhost-scsi-pci code (asias + mst)
   - Check for aborted sequence in tcm_fc response path (mdr)
   - Add initial iscsit_transport support into iscsi-target code (nab)
   - Refactor iscsi-target RX PDU logic + export request PDU handling
     (nab)
   - Refactor iscsi-target TX queue logic + export response PDU creation
     (nab)
   - Add new iSCSI Extentions for RDMA (ISER) target driver (Or + nab)

  The biggest changes revolve around iscsi-target refactoring in order
  to support the iser-target driver.  This includes the conversion of
  the iscsi-target data-path to use modern se_cmd->cmd_kref counting,
  and allowing transport independent aspects of RX/TX PDU
  request/response handling be shared across existing traditional
  iscsi-target code, and the new iser-target code.

  Thanks to Or Gerlitz + Mellanox for supporting the iser-target
  development effort!"

* 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (25 commits)
  iser-target: Add iSCSI Extensions for RDMA (iSER) target driver
  tcm_vhost: Enable VIRTIO_SCSI_F_HOTPLUG
  tcm_vhost: Add ioctl to get and set events missed flag
  tcm_vhost: Add hotplug/hotunplug support
  tcm_vhost: Refactor the lock nesting rule
  tcm_fc: Check for aborted sequence
  iscsi-target: Add iser network portal attribute
  iscsi-target: Refactor TX queue logic + export response PDU creation
  iscsi-target: Refactor RX PDU logic + export request PDU handling
  iscsi-target: Add per transport iscsi_cmd alloc/free
  iscsi-target: Add iser-target parameter keys + setup during login
  iscsi-target: Initial traditional TCP conversion to iscsit_transport
  iscsi-target: Add iscsit_transport API template
  target: Add export of target_get_sess_cmd symbol
  target: Change default sense key of NOT_READY
  target/file: Set is_nonrot attribute
  target: Add sbc_execute_unmap() helper
  target/iblock: Add iblock_do_unmap() helper
  target/file: Add fd_do_unmap() helper
  target/file: Add UNMAP emulation support
  ...
2013-04-30 13:14:57 -07:00
Nicholas Bellinger
b8d26b3be8 iser-target: Add iSCSI Extensions for RDMA (iSER) target driver
This patch adds support for iSCSI Extensions for RDMA target mode,
and includes CQ pooling per isert_device context distributed across
multiple active iser target sessions.

It also uses cmwq process context for RX / TX ib_post_cq() polling
via isert_cq_desc->cq_[rx,tx]_work invoked by isert_cq_[rx,tx]_callback()
hardIRQ context callbacks.

v5 changes:

- Use ISER_RECV_DATA_SEG_LEN instead of hardcoded value in ISER_RX_PAD_SIZE (Or)
- Fix make W=1 warnings (Or)
- Add missing depends on NET && INFINIBAND_ADDR_TRANS in Kconfig (Randy + Or)
- Make isert_device_find_by_ib_dev() return proper ERR_PTR (Wei Yongjun)
- Properly setup iscsi_np->np_sockaddr in isert_setup_np() (Shlomi + nab)
- Add special case for early ISCSI_OP_SCSI_CMD exception handling (nab)

v4 changes:
- Mark isert_cq_rx_work as static (Or)
- Drop unnecessary ib_dma_sync_single_for_cpu + ib_dma_sync_single_for_device
  calls for isert_cmd->sense_buf_dma from isert_put_response (Or)
- Use 12288 for ISER_RX_PAD_SIZE base to save extra page per
  struct iser_rx_desc (Or + nab)
- Drop now unnecessary isert_rx_desc usage, and convert RX users to
  iser_rx_desc (Or + nab)
- Move isert_[alloc,free]_rx_descriptors() ahead of
  isert_create_device_ib_res() usage (nab)
- Mark isert_cq_[rx,tx]_callback() + prototypes as static
- Fix 'warning: 'ret' may be used uninitialized' warning for
  isert_create_device_ib_res on powerpc allmodconfig (fengguang + nab)
- Fix 'warning: 'ret' may be used uninitialized' warning for
  isert_connect_request on i386 allyesconfig (fengguang + nab)
- Fix pr_debug conversion specification in isert_rx_completion()
  (fengguang + nab)
- Drop unnecessary isert_conn->conn_cm_id != NULL check in
  isert_connect_release causing the build warning:
  "variable dereferenced before check 'isert_conn->conn_cm_id'"
- Fix isert_lid + isert_np leak in isert_setup_np failure path
- Add isert_conn->conn_wait_comp_err usage in isert_free_conn()
  for isert_cq_comp_err completion path
- Add isert_conn->logout_posted bit to determine decrement of
  isert_conn->post_send_buf_count from logout response completion
- Always set ISER_CONN_DOWN from isert_disconnect_work() callback

v3 changes:

- Convert to use per isert_cq_desc->cq_[rx,tx]_work + drop tasklets (Or + nab)
- Move IB_EVENT_QP_LAST_WQE_REACHED warn into correct
  isert_qp_event_callback (Or)
- Drop unnecessary IB_ACCESS_REMOTE_* access flag usage in
  isert_create_device_ib_res (Or)
- Add common isert_init_send_wr(), and convert isert_put_* calls (Or)
- Move to verbs+core logic to single ib_isert.[c,h]  (Or + nab)
- Add kmem_cache isert_cmd_cache usage for descriptor allocation (nab)
- Move common ib_post_send() logic used by isert_put_*() to
  isert_post_response() (nab)
- Add isert_put_reject call in isert_response_queue() for posting
  ISCSI_REJECT response. (nab)
- Add ISTATE_SEND_REJECT checking in isert_do_control_comp. (nab)

v2 changes:

- Drop unused ISERT_ADDR_ROUTE_TIMEOUT define
- Add rdma_notify() call for IB_EVENT_COMM_EST in isert_qp_event_callback()
- Make isert_query_device() less verbose
- Drop unused RDMA_CM_EVENT_ADDR_ERROR and RDMA_CM_EVENT_ROUTE_ERROR
  cases from isert_cma_handler()
- Drop unused rdma/ib_fmr_pool.h include
- Update isert_conn_setup_qp() to assign cq based upon least used
- Add isert_create_device_ib_res() to setup PD, CQs and MRs for each
  underlying struct ib_device, instead of using per isert_conn resources.
- Add isert_free_device_ib_res() to release PD, CQs and MRs for each
  underlying struct ib_device.
- Add isert_device_find_by_ib_dev()
- Change isert_connect_request() to drop PD, CQs and MRs allocation,
  and use isert_device_find_by_ib_dev() instead.
- Add isert_device_try_release()
- Change isert_connect_release() to decrement cq_active_qps, and drop
  PD, CQs and MRs resource release.
- Update isert_connect_release() to call isert_device_try_release()
- Make isert_create_device_ib_res() determine device->cqs_used based
  upon num_online_cpus()
- Drop misleading isert_dump_ib_wc() usage
- Drop unused rdma/ib_fmr_pool.h include
- Use proper xfer_len for login PDUs in isert_rx_completion()
- Add isert_release_cmd() usage
- Change isert_alloc_cmd() to setup iscsi_cmd.release_cmd() pointer
- Change isert_put_cmd() to perform per iscsi_opcode specific release
  logic
- Add isert_unmap_cmd() call for ISCSI_OP_SCSI_CMD from isert_put_cmd()
- Change isert_send_completion() to call
  atomic_dec(&isert_conn->post_send_buf_count)
  based upon per iscsi_opcode logic
- Drop ISTATE_REMOVE processing from isert_immediate_queue()
- Drop ISTATE_SEND_DATAIN processing from isert_response_queue()
- Drop ISTATE_SEND_STATUS processing from isert_response_queue()
- Drop iscsit_transport->iscsit_unmap_cmd() and ->iscsit_free_cmd()
- Convert iser_cq_tx_tasklet() to use struct isert_cq_desc pooling logic
- Convert isert_cq_tx_callback() to use struct isert_cq_desc pooling
  logic
- Convert iser_cq_rx_tasklet() to use struct isert_cq_desc pooling logic
- Convert isert_cq_rx_callback() to use struct isert_cq_desc pooling
  logic
- Add explict iscsit_stop_dataout_timer() call to
  isert_do_rdma_read_comp()
- Use isert_get_dataout() for iscsit_transport->iscsit_get_dataout()
  caller
- Drop ISTATE_SEND_R2T processing from isert_immediate_queue()
- Drop unused rdma/ib_fmr_pool.h include
- Drop isert_cmd->cmd_kref in favor of se_cmd->cmd_kref usage
- Add struct isert_device in order to support multiple EQs + CQ pooling
- Add struct isert_cq_desc
- Drop tasklets and cqs from isert_conn
- Bump ISERT_MAX_CQ to 64
- Various minor checkpatch fixes

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-04-25 01:09:41 -07:00
Patrick McHardy
dc850b0e68 IPoIB: add support for TIPC protocol
Support TIPC in the IPoIB driver. Since IPoIB now keeps track of its own
neighbour entries and doesn't require the packet to have a dst_entry
anymore, the only necessary changes are to:

- not drop multicast TIPC packets because of the unknown ethernet type
- handle unicast TIPC packets similar to IPv4/IPv6 unicast packets

in ipoib_start_xmit().

An alternative would be to remove all ethertype limitations since they're
not necessary anymore, all TIPC needs to know about is ARP and RARP since
it wants to always perform "path find", even if a path is already known.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-17 14:18:33 -04:00
Grant Grundler
532ec6f1c0 SRPT: Fix odd use of WARN_ON()
While WARN_ON("const string") will work, it's intent is not obvious.
The warning is more useful by showing the "state" value.

Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-16 22:58:08 -07:00
Doug Ledford
83bdd3b96c IPoIB: Fix ipoib_hard_header() return value
If you have a patched up dhcp server (and dhclient), they will use
AF_PACKET/SOCK_DGRAM pair to send dhcp packets over IPoIB.

However, when testing an upstream kernel, this has been broken for a
very long time (I tested 2.6.34, 2.6.38, 3.0, 3.1, 3.8, HEAD).

It turns out that the hard_header routine in ipoib is not following
the API and is returning 0 even when it pushed data onto the skb.
This then causes af_packet.c to overwrite the header just pushed with
data from user space.

Fixing this gets DHCP working on IPoIB.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-16 22:57:09 -07:00
Akinobu Mita
cc529c0d72 RDMA: Rename random32() to prandom_u32()
Use more preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Steve Wise <swise@chelsio.com>
Cc: linux-rdma@vger.kernel.org
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-16 22:47:05 -07:00
Mike Marciniszyn
1ee9e2aa7b IPoIB: Fix send lockup due to missed TX completion
Commit f0dc117abd ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-03-22 18:01:04 -07:00
Roland Dreier
ef4e359d9b Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-next 2013-02-26 09:17:56 -08:00
Roland Dreier
f72dd56690 IPoIB: Free ipoib neigh on path record failure so path rec queries are retried
If IPoIB fails to look up a path record (eg if it tries during an SM
failover when one SM is dead but the new one hasn't taken over yet), the
driver ends up with a neighbour structure but no address handle (AH).
There's no mechanism to recover from this: any further packets sent to
this destination will be silently dumped in ipoib_start_xmit().

Fix this by freeing the neighbour structures when a path rec query
fails, so that the next packet queued to be sent will trigger a new path
record query.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:42:15 -08:00
Bart Van Assche
2ce19e72f4 IB/srp: Fail I/O requests if the transport is offline
If an SRP target is no longer reachable and srp_reset_host() fails to
reconnect then ib_srp will invoke scsi_remove_host().  That function
will invoke __scsi_remove_device() for each LUN.  And that last
function will change the device state from SDEV_TRANSPORT_OFFLINE into
SDEV_CANCEL.  Certain user space software, e.g. older versions of
multipathd, continue queueing I/O to SCSI devices that are in the
SDEV_CANCEL state.

If these I/O requests are submitted as SG_IO that means that the
REQ_PREEMPT flag will be set and hence that these requests will be
passed to srp_queuecommand().  These requests will time out.  If new
requests are queued fast enough from user space these active requests
will prevent __scsi_remove_device() to finish.

Avoid this by failing I/O requests in the SDEV_CANCEL state if the
transport is offline.  Introduce a new variable to keep track of the
transport state instead of failing requests if (!target->connected ||
target->qp_in_error), so that the SCSI error handler has a chance to
retry commands after a transport layer failure occurred.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:31:14 -08:00
Bart Van Assche
c7c4e7ff80 IB/srp: Avoid endless SCSI error handling loop
If a SCSI command times out it is passed to the SCSI error
handler. The SCSI error handler will try to abort the commands that
timed out.  If aborting fails, a device reset will be attempted.  If
the device reset also fails a host reset will be attempted.  If the
host reset also fails the whole procedure will be repeated.

srp_abort() and srp_reset_device() fail for a QP in the error state.
srp_reset_host() fails after host removal has started.  Hence if the
SCSI error handler gets invoked after host removal has started and
with the QP in the error state an endless loop will be triggered.

Modify the SCSI error handling functions in ib_srp as follows:
- Abort SCSI commands properly even if the QP is in the error state.
- Make srp_reset_host() reset SCSI requests even after host removal
  has already started or if reconnecting fails.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:31:14 -08:00
Bart Van Assche
3780d1f088 IB/srp: Avoid sending a task management function needlessly
Do not send a task management function if sending will fail anyway
because either there is no RDMA/RC connection or the QP is in the
error state.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:31:14 -08:00
Bart Van Assche
e1b2f13aba IB/srp: Track connection state properly
Remove an assignment that incorrectly overwrites the connection state
update by srp_connect_target().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:31:14 -08:00
Or Gerlitz
5525d210fd IB/iser: Enable iser when FMRs are not supported
Reuse the "SG unaligned for FMR" driver flow to make the initiator
functional when running over driver instance which doesn't support
FMRs, such as a mlx4 virtual function.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-22 00:22:30 -08:00
Or Gerlitz
819a087316 IB/iser: Avoid error prints on EAGAIN registration failures
Under IO/CPU stress its possible that the FMR pool might not have a
free FMR mapping element for iSER to use because of incomplete
background unmapping processing.  In that case we get -EAGAIN and the
IO is pushed back to the SCSI layer which soon retries it.  No need to
be so verbose about that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-22 00:22:29 -08:00
Or Gerlitz
b96e4abaae IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML
ISER_DEF_CMD_PER_LUN was meant to be ISCSI_DEF_XMIT_CMDS_MAX, not plain 128

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-22 00:22:29 -08:00
Itai Garbi
5a2815f03c IPoIB: Don't attempt to release resources on error flow
If the ipoib client info isn't found on the _remove_one callback, we
must not attempt to scan the returned null list.  Found by Coverity.

Signed-off-by: Itai Garbi <igarbi@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-19 08:21:36 -08:00
Yan Burman
4b48680b55 IPoIB: Add version and firmware info to ethtool reporting
Implement version info as well as report firmware version and bus info
of the underlying IB HW device.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-19 08:21:36 -08:00
Shlomo Pongratz
9d1ad66e3e IPoIB: Fix ipoib_neigh hashing to use the correct daddr octets
The hash function introduced in commit b63b70d877 ("IPoIB: Use a
private hash table for path lookup in xmit path") was designd to use
the 3 octets of the IPoIB HW address that holds the remote QPN.
However, this currently isn't the case on little-endian machines,
because the the code there uses the flags part (octet[0]) and not the
last octet of the QPN (octet[3]).  Fix this.

The fix caused a checkpatch warning on line over 80 characters, to
solve that changed the name of the temp variable that holds the daddr.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-19 08:21:35 -08:00
Shlomo Pongratz
7e5a90c25f IPoIB: Fix crash due to skb double destruct
After commit b13912bbb4 ("IPoIB: Call skb_dst_drop() once skb is
enqueued for sending"), using connected mode and running multithreaded
iperf for long time, ie

    iperf -c <IP> -P 16 -t 3600

results in a crash.

After the above-mentioned patch, the driver is calling skb_orphan() and
skb_dst_drop() after calling post_send() in ipoib_cm.c::ipoib_cm_send()
(also in ipoib_ib.c::ipoib_send())

The problem with this is, as is written in a comment in both routines,
"it's entirely possible that the completion handler will run before we
execute anything after the post_send()."  This leads to running the
skb cleanup routines simultaneously in two different contexts.

The solution is to always perform the skb_orphan() and skb_dst_drop()
before queueing the send work request.  If an error occurs, then it
will be no different than the regular case where dev_free_skb_any() in
the completion path, which is assumed to be after these two routines.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-05 09:35:06 -08:00
Roland Dreier
b13912bbb4 IPoIB: Call skb_dst_drop() once skb is enqueued for sending
Currently, IPoIB delays collecting send completions for TX packets in
order to batch work more efficiently.  It does skb_orphan() right after
queuing the packets so that destructors run early, to avoid problems
like holding socket send buffers for too long (since we might not
collect a send completion until a long time after the packet is
actually sent).

However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks
at skb_dst() to update the PMTU when it gets a too-long packet.  This
means that the packets sitting in the TX ring with uncollected send
completions are holding a reference on the dst.  We've seen this lead
to pathological behavior with respect to route and neighbour GC.  The
easy fix for this is to call skb_dst_drop() when we call skb_orphan().

Also, give packets sent via connected mode (CM) the same skb_orphan()
/ skb_dst_drop() treatment that packets sent via datagram mode get.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19 09:16:43 -08:00
Linus Torvalds
5bd665f28d Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull target updates from Nicholas Bellinger:
 "It has been a very busy development cycle this time around in target
  land, with the highlights including:

   - Kill struct se_subsystem_dev, in favor of direct se_device usage
     (hch)
   - Simplify reservations code by combining SPC-3 + SCSI-2 support for
     virtual backends only (hch)
   - Simplify ALUA code for virtual only backends, and remove left over
     abstractions (hch)
   - Pass sense_reason_t as return value for I/O submission path (hch)
   - Refactor MODE_SENSE emulation to allow for easier addition of new
     mode pages.  (roland)
   - Add emulation of MODE_SELECT (roland)
   - Fix bug in handling of ExpStatSN wrap-around (steve)
   - Fix bug in TMR ABORT_TASK lookup in qla2xxx target (steve)
   - Add WRITE_SAME w/ UNMAP=0 support for IBLOCK backends (nab)
   - Convert ib_srpt to use modern target_submit_cmd caller + drop
     legacy ioctx->kref usage (nab)
   - Convert ib_srpt to use modern target_submit_tmr caller (nab)
   - Add link_magic for fabric allow_link destination target_items for
     symlinks within target_core_fabric_configfs.c code (nab)
   - Allocate pointers in instead of full structs for
     config_group->default_groups (sebastian)
   - Fix 32-bit highmem breakage for FILEIO (sebastian)

  All told, hch was able to shave off another ~1K LOC by killing the
  se_subsystem_dev abstraction, along with a number of PR + ALUA
  simplifications.  Also, a nice patch by Roland is the refactoring of
  MODE_SENSE handling, along with the addition of initial MODE_SELECT
  emulation support for virtual backends.

  Sebastian found a long-standing issue wrt to allocation of full
  config_group instead of pointers for config_group->default_group[]
  setup in a number of areas, which ends up saving memory with big
  configurations.  He also managed to fix another long-standing BUG wrt
  to broken 32-bit highmem support within the FILEIO backend driver.

  Thank you again to everyone who contributed this round!"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
  target/iscsi_target: Add NodeACL tags for initiator group support
  target/tcm_fc: fix the lockdep warning due to inconsistent lock state
  sbp-target: fix error path in sbp_make_tpg()
  sbp-target: use simple assignment in tgt_agent_rw_agent_state()
  iscsi-target: use kstrdup() for iscsi_param
  target/file: merge fd_do_readv() and fd_do_writev()
  target/file: Fix 32-bit highmem breakage for SGL -> iovec mapping
  target: Add link_magic for fabric allow_link destination target_items
  ib_srpt: Convert TMR path to target_submit_tmr
  ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref
  target: Make spc_get_write_same_sectors return sector_t
  target/configfs: use kmalloc() instead of kzalloc() for default groups
  target/configfs: allocate only 6 slots for dev_cg->default_groups
  target/configfs: allocate pointers instead of full struct for default_groups
  target: update error handling for sbc_setup_write_same()
  iscsit: use GFP_ATOMIC under spin lock
  iscsi_target: Remove redundant null check before kfree
  target/iblock: Forward declare bio helpers
  target: Clean up flow in transport_check_aborted_status()
  target: Clean up logic in transport_put_cmd()
  ...
2012-12-15 14:25:10 -08:00
Bart Van Assche
dc1bdbd9b8 IB/srp: Allow SRP disconnect through sysfs
Make it possible to disconnect the IB RC connection used by the SRP
protocol to communicate with a target.

Have the SRP transport layer create a sysfs "delete" attribute for
initiator drivers that support this functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:33 -08:00
Vu Pham
55d93898a1 IB/srp: send disconnect request without waiting for CM timewait exit
Now that SRP recreates the CM ID, QP, and CQ for each connection,
there is no need to wait for the timewait state to complete.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:32 -08:00
Ishai Rabinovitz
73aa89ed9e IB/srp: destroy and recreate QP and CQs when reconnecting
HW QP FATAL errors persist over a reset operation, but we can recover
from that by recreating the QP and associated CQs for each connection.
Creating a new QP/CQ also completely forecloses any possibility of
getting stale completions or packets on the new connection.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>

[ updated to current code from OFED, cleaned up commit message ]

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:32 -08:00
Bart Van Assche
ef6c49d87c IB/srp: Eliminate state SRP_TARGET_DEAD
Only queue removal work after having changed the target state
into SRP_TARGET_REMOVED and not if that state was already equal
to SRP_TARGET_REMOVED.  That allows us to remove the state
SRP_TARGET_DEAD.  Add a call to srp_disconnect_target() in
srp_remove_target() -- due to previous changes it is now safe to
invoke that function even if the IB connection has already
been disconnected.  This change allows us to replace the target
removal code in srp_remove_one() by an (indirect) call to
srp_remove_target().  Rename srp_target_port.work into
srp_target_port.remove_work to reflect its usage.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche
ee12d6a80c IB/srp: Introduce the helper function srp_remove_target()
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche
294c875a65 IB/srp: Suppress superfluous error messages
Keep track of the connection state.  Only report QP errors while
connected.  Only invoke ib_send_cm_dreq() when connected so that
invoking srp_disconnect_target() after having received a DREQ does not
cause an error message to be printed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche
4f0af69799 IB/srp: Process all error completions
If the RDMA RC connection is closed, tell the SCSI mid-layer to
terminate all pending commands instead of only the first.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche
948d1e889e IB/srp: Introduce srp_handle_qp_err()
Introduce the function srp_handle_qp_err(), change the type of
qp_in_error from int into bool and move the initialization of that
variable from srp_reconnect_target() to srp_connect_target().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:30 -08:00
Bart Van Assche
224db15743 IB/srp: Simplify SCSI error handling
Since scsi_remove_host() has been modified so that SCSI error handling
functions will no longer be invoked after scsi_remove_host() returns,
the test at the start of srp_send_tsk_mgmt() is now superfluous.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:30 -08:00
Bart Van Assche
f371823120 IB/srp: Keep processing commands during host removal
Some SCSI upper layer drivers, e.g. sd, issue SCSI commands from
inside scsi_remove_host() (see the sd_shutdown() call in sd_remove()).
Make sure that these commands have a chance to reach the SCSI device.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:30 -08:00
Bart Van Assche
09be70a238 IB/srp: Eliminate state SRP_TARGET_CONNECTING
Block the SCSI host while reconnecting instead of representing the
reconnection activity as a distinct SRP target state.  This allows us
to eliminate the target state SRP_TARGET_CONNECTING.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:29 -08:00
Bart Van Assche
c9b03c1ae5 IB/srp: Increase block layer timeout
Increase the block layer timeout for disks so that it is above the
InfiniBand transport layer timeout.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:29 -08:00
Nicholas Bellinger
3e4f574857 ib_srpt: Convert TMR path to target_submit_tmr
This patch converts the TMR path in srpt_handle_tsk_mgmt() to use
target_submit_tmr() with TARGET_SCF_ACK_KREF flag usage.

v2: Drop ununused res in target_submit_tmr (Fengguang.Wu)

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-11-28 11:20:28 -08:00
Nicholas Bellinger
9474b04313 ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref
This patch converts the main srpt_handle_cmd() I/O path to use modern
target_submit_cmd() with TARGET_SCF_ACK_KREF flag usage.  This includes
dropping the original internal ioctx->kref + srpt_put_send_ioctx() usage
in favor of target_put_sess_cmd() w/ se_cmd_t->cmd_kref within ib_srpt
response callbacks.

It also updates srpt_abort_cmd() to call target_put_sess_cmd() for
completion of aborted commands, and adds target_wait_for_sess_cmds() into
srpt_release_channel_work() to allow outstanding I/O to complete during
session shutdown.

Also, go ahead and update srpt_handle_tsk_mgmt() to make the remaining
transport_init_se_cmd() to setup the ioctx->cmd with se_tmr_req.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-11-28 01:10:59 -08:00
Christoph Hellwig
de103c93af target: pass sense_reason as a return value
Pass the sense reason as an explicit return value from the I/O submission
path instead of storing it in struct se_cmd and using negative return
values.  This cleans up a lot of the code pathes, and with the sparse
annotations for the new sense_reason_t type allows for much better
error checking.

(nab: Convert spc_emulate_modesense + spc_emulate_modeselect to use
      sense_reason_t with Roland's MODE SELECT changes)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-11-06 20:55:46 -08:00
Linus Torvalds
a188e7e93a Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull scsi target updates from Nicholas Bellinger:
 "Things have been calm for the most part with no new fabric drivers in
  flight for v3.7 (we're up to eight now !), so this update is primarily
  focused on addressing a few long-standing items within target-core and
  iscsi-target fabric code.

  The highlights include:

   - target: Simplify fabric sense data length handling (roland)
   - qla2xxx: Fix endianness of task management response code (roland)
   - target: fix truncation of mode data, support zero allocation length
     (paolo)
   - target: Properly support zero-length commands in normal processing
     path (paolo)
   - iscsi-target: Correctly set 0xffffffff field within ISCSI_OP_REJECT
     PDU (ronnie + nab)
   - iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG
     demo-mode (ronnie + nab)
   - target/file: Re-enable optional fd_buffered_io=1 operation (nab +
     hch)
   - iscsi-target: Add MaxXmitDataSegmenthLength forr target ->
     initiator MDRSL declaration (nab)
   - target: Add target_submit_cmd_map_sgls for SGL fabric memory
     passthrough (nab + hch)
   - tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls (hch +
     nab)
   - tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls (nab
     + hch)

  The last series for adding a new target_submit_cmd_map_sgls() fabric
  caller (as requested by hch) that accepts pre-allocated SGL memory
  (using existing logic), along with converting tcm_loop + tcm_vhost has
  only been in -next for the last days, but has gotten enough review
  +testing and is clear enough a mechanical change that I think it's
  reasonable to merge for -rc1 code.

  Thanks again to everyone who contributed this round! Extra special
  thanks to Roland (PureStorage) for tracking down the qla2xxx target
  TMR response code endian issue, and to Paolo (Redhat) for resolving
  the long standing zero-length CDB issues within target-core between
  virtual and pSCSI backends."

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (44 commits)
  iscsi-target: Bump defaults for nopin_timeout + nopin_response_timeout values
  iscsit: proper endianess conversions
  iscsit: use the itt_t abstract type
  iscsit: add missing endianess conversion in iscsit_check_inaddr_any
  iscsit: remove incorrect unlock in iscsit_build_sendtargets_resp
  iscsit: mark various functions static
  target/iscsi: precedence bug in iscsit_set_dataout_sequence_values()
  target/usb-gadget: strlen() doesn't count the terminator
  target/usb-gadget: remove duplicate initialization
  tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls
  target: Add control CDB READ payload zero work-around
  tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls
  target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough
  iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode
  iscsi-target: Change iscsi_target_seq_pdu_list.c to honor MaxXmitDataSegmentLength
  iscsi-target: Add MaxXmitDataSegmentLength connection recovery check
  iscsi-target: Convert incoming PDU payload checks to MaxXmitDataSegmentLength
  iscsi-target: Enable MaxXmitDataSegmentLength operation in login path
  iscsi-target: Add base MaxXmitDataSegmentLength code
  target/file: Re-enable optional fd_buffered_io=1 operation
  ...
2012-10-10 19:52:19 +09:00
Roland Dreier
56040f07b2 Merge branches 'cma', 'ipoib', 'iser', 'mlx4' and 'nes' into for-next 2012-10-04 19:12:22 -07:00
Alex Tabachnik
5a33a66949 IB/iser: Add more RX CQs to scale out processing of SCSI responses
RX/TX CQs will now be selected from a per HCA pool.  For the RX flow
this has the effect of using different interrupt vectors when using
low level drivers (such as mlx4) that map the "vector" param provided
by the ULP on CQ creation to a dedicated IRQ/MSI-X vector.  This
allows the RX flow processing of IO responses to be distributed across
multiple CPUs.

QPs (--> iSER sessions) are assigned to CQs in round robin order using
the CQ with the minimum number of sessions attached to it.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-03 21:26:49 -07:00
Roland Dreier
71d9c5f9e6 IPoIB: Fix build with CONFIG_INFINIBAND_IPOIB_CM=n
With the new netlink support in commit 862096a8bb ("IB/ipoib: Add more
rtnl_link_ops callbacks") we need ipoib_set_mode() to be available even
if connected mode isn't built.  Move the function from ipoib_cm.c to
ipoib_main.c (and make a few CM-related macros available unconditonally).

This fixes the build error

    drivers/built-in.o: In function 'ipoib_changelink':
    ipoib_netlink.c:(.text+0x6a5fc9): undefined reference to 'ipoib_set_mode'
    ipoib_netlink.c:(.text+0x6a5fe3): undefined reference to 'ipoib_set_mode'

when CONFIG_INFINIBAND_IPOIB_CM isn't set.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-02 21:33:41 -07:00
Linus Torvalds
7a9a2970b5 First batch of InfiniBand/RDMA changes for the 3.7 merge window:
- mlx4 IB support for SR-IOV
  - A couple of SRP initiator fixes
  - Batch of nes hardware driver fixes
  - Fix for long-standing use-after-free crash in IPoIB
  - Other miscellaneous fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2
 kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v
 dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV
 0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA
 L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr
 /Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5
 YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h
 Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3
 Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p
 60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h
 ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL
 fgN5n987wru91ewdX4gW
 =PAcy
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband updates from Roland Dreier:
 "First batch of InfiniBand/RDMA changes for the 3.7 merge window:
   - mlx4 IB support for SR-IOV
   - A couple of SRP initiator fixes
   - Batch of nes hardware driver fixes
   - Fix for long-standing use-after-free crash in IPoIB
   - Other miscellaneous fixes"

This merge also removes a new use of __cancel_delayed_work(), and
replaces it with the regular cancel_delayed_work() that is now irq-safe
thanks to the workqueue updates.

That said, I suspect the sequence in question should probably use
"mod_delayed_work()".  I just did the minimal "don't use deprecated
functions" fixup, though.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
  IB/qib: Fix local access validation for user MRs
  mlx4_core: Disable SENSE_PORT for multifunction devices
  mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs
  mlx4_core: Stash PCI ID driver_data in mlx4_priv structure
  IB/srp: Avoid having aborted requests hang
  IB/srp: Fix use-after-free in srp_reset_req()
  IB/qib: Add a qib driver version
  RDMA/nes: Fix compilation error when nes_debug is enabled
  RDMA/nes: Print hardware resource type
  RDMA/nes: Fix for crash when TX checksum offload is off
  RDMA/nes: Cosmetic changes
  RDMA/nes: Fix for incorrect MSS when TSO is on
  RDMA/nes: Fix incorrect resolving of the loopback MAC address
  mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem
  mlx4_core: Trivial cleanups to driver log messages
  mlx4_core: Trivial readability fix: "0X30" -> "0x30"
  IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes
  mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
  mlx4: Paravirtualize Node Guids for slaves
  mlx4: Activate SR-IOV mode for IB
  ...
2012-10-02 17:20:40 -07:00
Roland Dreier
d172f5a4ab Merge branches 'cma', 'cxgb4', 'ipoib', 'mlx4', 'mlx4-sriov', 'nes', 'qib' and 'srp' into for-linus 2012-10-02 07:43:59 -07:00
Or Gerlitz
862096a8bb IB/ipoib: Add more rtnl_link_ops callbacks
Add the rtnl_link_ops changelink and fill_info callbacks, through
which the admin can now set/get the driver mode, etc policies.
Maintain the proprietary sysfs entries only for legacy childs.

For child devices, set dev->iflink to point to the parent
device ifindex, such that user space tools can now correctly
show the uplink relation as done for vlan, macvlan, etc
devices. Pointed out by Patrick McHardy <kaber@trash.net>

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-01 17:12:22 -04:00
Bart Van Assche
d853667091 IB/srp: Avoid having aborted requests hang
We need to call scsi_done() for commands after we abort them.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:36:48 -07:00
Bart Van Assche
9b796d06d5 IB/srp: Fix use-after-free in srp_reset_req()
srp_free_req() uses the scsi_cmnd structure contents to unmap
buffers, so we must invoke srp_free_req() before we release
ownership of that structure.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:36:47 -07:00
Patrick McHardy
bea1e22df4 IPoIB: Fix use-after-free of multicast object
Fix a crash in ipoib_mcast_join_task().  (with help from Or Gerlitz)

Commit c8c2afe360 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().

In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries.  This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.

Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called.  To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:32:33 -07:00
David S. Miller
6a06e5e1bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/team/team.c
	drivers/net/usb/qmi_wwan.c
	net/batman-adv/bat_iv_ogm.c
	net/ipv4/fib_frontend.c
	net/ipv4/route.c
	net/l2tp/l2tp_netlink.c

The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.

qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

With help from Antonio Quartulli.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-28 14:40:49 -04:00
Or Gerlitz
9baa0b0364 IB/ipoib: Add rtnl_link_ops support
Add rtnl_link_ops to IPoIB, with the first usage being child device
create/delete through them. Childs devices are now either legacy ones,
created/deleted through the ipoib sysfs entries, or RTNL ones.

Adding support for RTNL childs involved refactoring of ipoib_vlan_add
which is now used by both the sysfs and the link_ops code.

Also, added ndo_uninit entry to support calling unregister_netdevice_queue
from the rtnl dellink entry. This required removal of calls to
ipoib_dev_cleanup from the driver in flows which use unregister_netdevice,
since the networking core will invoke ipoib_uninit which does exactly that.

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 16:49:17 -04:00
Roland Dreier
9c58b7ddd7 target: Simplify fabric sense data length handling
Every fabric driver has to supply a se_tfo->set_fabric_sense_len()
method, just so iSCSI can return an offset of 2.  However, every fabric
driver is already allocating a sense buffer and passing it into the
target core, either via transport_init_se_cmd() or target_submit_cmd().

So instead of having iSCSI pass the start of its sense buffer into the
core and then later tell the core to skip the first 2 bytes, it seems
easier for iSCSI just to do the offset of 2 when it passes the sense
buffer into the core.  Then we can drop the se_tfo->set_fabric_sense_len()
everywhere, and just add a couple of lines of code to iSCSI to set the
sense data length to the beginning of the buffer right before it sends
it over the network.

(nab: Remove .set_fabric_sense_len usage from tcm_qla2xxx_npiv_ops +
      change transport_get_sense_buffer to follow v3.6-rc6 code w/o
      ->set_fabric_sense_len usage)

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-09-17 17:12:58 -07:00
Roland Dreier
2ed772b7b9 target: Remove unused target_core_fabric_ops.get_fabric_sense_len method
There are no callers of se_tfo->get_fabric_sense_len(), so we should
stop having every fabric driver implement it.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-09-17 16:15:47 -07:00
Shlomo Pongratz
b5120a6e11 IPoIB: Fix AB-BA deadlock when deleting neighbours
Lockdep points out a circular locking dependency betwwen the ipoib
device priv spinlock (priv->lock) and the neighbour table rwlock
(ntbl->rwlock).

In the normal path, ie neigbour garbage collection task, the neigh
table rwlock is taken first and then if the neighbour needs to be
deleted, priv->lock is taken.

However in some error paths, such as in ipoib_cm_handle_tx_wc(),
priv->lock is taken first and then ipoib_neigh_free routine is called
which in turn takes the neighbour table ntbl->rwlock.

The solution is to get rid the neigh table rwlock completely and use
only priv->lock.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-12 09:21:45 -07:00
Shlomo Pongratz
66172c0993 IPoIB: Fix memory leak in the neigh table deletion flow
If the neighbours hash table is empty when unloading the module, then
ipoib_flush_neighs(), the cleanup routine, isn't called and the
memory used for the hash table itself leaked.

To fix this, ipoib_flush_neighs() is allways called, and another
completion object is added to signal when the table is freed.

Once invoked, ipoib_flush_neighs() flushes all the neighbours (if
there are any), calls the the hash table RCU free routine, which now
signals completion of the deletion process, and waits for the last
neighbour to be freed.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-12 09:05:03 -07:00
Roland Dreier
c0369b296e Merge branches 'cma', 'ipoib', 'misc', 'mlx4', 'ocrdma', 'qib' and 'srp' into for-next 2012-08-16 09:38:39 -07:00
Bart Van Assche
220329916c IB/srp: Fix a race condition
Avoid a crash caused by the scmnd->scsi_done(scmnd) call in
srp_process_rsp() being invoked with scsi_done == NULL.  This can
happen if a reply is received during or after a command abort.

Reported-by: Joseph Glanville <joseph.glanville@orionvm.com.au>
Reference: http://marc.info/?l=linux-rdma&m=134314367801595
Cc: <stable@vger.kernel.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 12:00:48 -07:00
Masanari Iida
142ad5db2b IB: Fix typos in infiniband drivers
Correct spelling typos in comments in drivers/infiniband.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 11:56:19 -07:00
Shlomo Pongratz
6c723a68c6 IB/ipoib: Fix RCU pointer dereference of wrong object
Commit b63b70d877 ("IPoIB: Use a private hash table for path lookup
in xmit path") introduced a bug where in ipoib_neigh_free() (which is
called from a few errors flows in the driver), rcu_dereference() is
invoked with the wrong pointer object, which results in a crash.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-14 15:21:44 -07:00
Shlomo Pongratz
fa16ebed31 IB/ipoib: Add missing locking when CM object is deleted
Commit b63b70d877 ("IPoIB: Use a private hash table for path lookup
in xmit path") introduced a bug where in ipoib_cm_destroy_tx() a CM
object is moved between lists without any supported locking.  Under a
stress test, this eventually leads to list corruption and a crash.

Previously when this routine was called, callers were taking the
device priv lock.  Currently this function is called from the RCU
callback associated with neighbour deletion.  Fix the race by taking
the same lock we used to before.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-14 15:21:44 -07:00
Shlomo Pongratz
b63b70d877 IPoIB: Use a private hash table for path lookup in xmit path
Dave Miller <davem@davemloft.net> provided a detailed description of
why the way IPoIB is using neighbours for its own ipoib_neigh struct
is buggy:

    Any time an ipoib_neigh is changed, a sequence like the following is made:

    			spin_lock_irqsave(&priv->lock, flags);
    			/*
    			 * It's safe to call ipoib_put_ah() inside
    			 * priv->lock here, because we know that
    			 * path->ah will always hold one more reference,
    			 * so ipoib_put_ah() will never do more than
    			 * decrement the ref count.
    			 */
    			if (neigh->ah)
    				ipoib_put_ah(neigh->ah);
    			list_del(&neigh->list);
    			ipoib_neigh_free(dev, neigh);
    			spin_unlock_irqrestore(&priv->lock, flags);
    			ipoib_path_lookup(skb, n, dev);

    This doesn't work, because you're leaving a stale pointer to the freed up
    ipoib_neigh in the special neigh->ha pointer cookie.  Yes, it even fails
    with all the locking done to protect _changes_ to *ipoib_neigh(n), and
    with the code in ipoib_neigh_free() that NULLs out the pointer.

    The core issue is that read side calls to *to_ipoib_neigh(n) are not
    being synchronized at all, they are performed without any locking.  So
    whether we hold the lock or not when making changes to *ipoib_neigh(n)
    you still can have threads see references to freed up ipoib_neigh
    objects.

    	cpu 1			cpu 2
    	n = *ipoib_neigh()
    				*ipoib_neigh() = NULL
    				kfree(n)
    	n->foo == OOPS

    [..]

    Perhaps the ipoib code can have a private path database it manages
    entirely itself, which holds all the necessary information and is
    looked up by some generic key which is available easily at transmit
    time and does not involve generic neighbour entries.

See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and
<http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b>
for the full discussion.

This patch aims to solve the race conditions found in the IPoIB driver.

The patch removes the connection between the core networking neighbour
structure and the ipoib_neigh structure.  In addition to avoiding the
race described above, it allows us to handle SKBs carrying IP packets
that don't have any associated neighbour.

We add an ipoib_neigh hash table with N buckets where the key is the
destination hardware address.  The ipoib_neigh is fetched from the
hash table and instead of the stashed location in the neighbour
structure. The hash table uses both RCU and reference counting to
guarantee that no ipoib_neigh instance is ever deleted while in use.

Fetching the ipoib_neigh structure instance from the hash also makes
the special code in ipoib_start_xmit that handles remote and local
bonding failover redundant.

Aged ipoib_neigh instances are deleted by a garbage collection task
that runs every M seconds and deletes every ipoib_neigh instance that
was idle for at least 2*M seconds. The deletion is safe since the
ipoib_neigh instances are protected using RCU and reference count
mechanisms.

The number of buckets (N) and frequency of running the GC thread (M),
are taken from the exported arb_tbl.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-30 07:46:50 -07:00
Linus Torvalds
5dedb9f3bd InfiniBand/RDMA changes for the 3.6 merge window:
- Updates to the qib low-level driver
  - First chunk of changes for SR-IOV support for mlx4 IB
  - RDMA CM support for IPv6-only binding
  - Other misc cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8
 6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6
 FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX
 a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV
 ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf
 xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte
 wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw
 ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ
 goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH
 +JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL
 ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO
 I7h9iJ1OMKFZ8bzmHsWb
 =f09j
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Updates to the qib low-level driver
 - First chunk of changes for SR-IOV support for mlx4 IB
 - RDMA CM support for IPv6-only binding
 - Other misc cleanups and fixes

Fix up some add-add conflicts in include/linux/mlx4/device.h and
drivers/net/ethernet/mellanox/mlx4/main.c

* tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
  IB/qib: checkpatch fixes
  IB/qib: Add congestion control agent implementation
  IB/qib: Reduce sdma_lock contention
  IB/qib: Fix an incorrect log message
  IB/qib: Fix QP RCU sparse warnings
  mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
  mlx4_core: Allow guests to have IB ports
  mlx4_core: Implement mechanism for reserved Q_Keys
  net/mlx4_core: Free ICM table in case of error
  IB/cm: Destroy idr as part of the module init error flow
  mlx4_core: Remove double function declarations
  IB/mlx4: Fill the masked_atomic_cap attribute in query device
  IB/mthca: Fill in sq_sig_type in query QP
  IB/mthca: Warning about event for non-existent QPs should show event type
  IB/qib: Fix sparse RCU warnings in qib_keys.c
  net/mlx4_core: Initialize IB port capabilities for all slaves
  mlx4: Use port management change event instead of smp_snoop
  IB/qib: RCU locking for MR validation
  IB/qib: Avoid returning EBUSY from MR deregister
  IB/qib: Fix UC MR refs for immediate operations
  ...
2012-07-24 13:56:26 -07:00
Linus Torvalds
3c4cfadef6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David S Miller:

 1) Remove the ipv4 routing cache.  Now lookups go directly into the FIB
    trie and use prebuilt routes cached there.

    No more garbage collection, no more rDOS attacks on the routing
    cache.  Instead we now get predictable and consistent performance,
    no matter what the pattern of traffic we service.

    This has been almost 2 years in the making.  Special thanks to
    Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who
    have helped along the way.

    I'm sure that with a change of this magnitude there will be some
    kind of fallout, but such things ought the be simple to fix at this
    point.  Luckily I'm not European so I'll be around all of August to
    fix things :-)

    The major stages of this work here are each fronted by a forced
    merge commit whose commit message contains a top-level description
    of the motivations and implementation issues.

 2) Pre-demux of established ipv4 TCP sockets, saves a route demux on
    input.

 3) TCP SYN/ACK performance tweaks from Eric Dumazet.

 4) Add namespace support for netfilter L4 conntrack helpers, from Gao
    Feng.

 5) Add config mechanism for Energy Efficient Ethernet to ethtool, from
    Yuval Mintz.

 6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet.

 7) Support for connection tracker helpers in userspace, from Pablo
    Neira Ayuso.

 8) Allow userspace driven TX load balancing functions in TEAM driver,
    from Jiri Pirko.

 9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with
    embedded gotos.

10) TCP Small Queues, essentially minimize the amount of TCP data queued
    up in the packet scheduler layer.  Whereas the existing BQL (Byte
    Queue Limits) limits the pkt_sched --> netdevice queuing levels,
    this controls the TCP --> pkt_sched queueing levels.

    From Eric Dumazet.

11) Reduce the number of get_page/put_page ops done on SKB fragments,
    from Alexander Duyck.

12) Implement protection against blind resets in TCP (RFC 5961), from
    Eric Dumazet.

13) Support the client side of TCP Fast Open, basically the ability to
    send data in the SYN exchange, from Yuchung Cheng.

    Basically, the sender queues up data with a sendmsg() call using
    MSG_FASTOPEN, then they do the connect() which emits the queued up
    fastopen data.

14) Avoid all the problems we get into in TCP when timers or PMTU events
    hit a locked socket.  The TCP Small Queues changes added a
    tcp_release_cb() that allows us to queue work up to the
    release_sock() caller, and that's what we use here too.  From Eric
    Dumazet.

15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits)
  genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP
  r8169: revert "add byte queue limit support".
  ipv4: Change rt->rt_iif encoding.
  net: Make skb->skb_iif always track skb->dev
  ipv4: Prepare for change of rt->rt_iif encoding.
  ipv4: Remove all RTCF_DIRECTSRC handliing.
  ipv4: Really ignore ICMP address requests/replies.
  decnet: Don't set RTCF_DIRECTSRC.
  net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse.
  ipv4: Remove redundant assignment
  rds: set correct msg_namelen
  openvswitch: potential NULL deref in sample()
  tcp: dont drop MTU reduction indications
  bnx2x: Add new 57840 device IDs
  tcp: avoid oops in tcp_metrics and reset tcpm_stamp
  niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value
  niu: Fix to check for dma mapping errors.
  net: Fix references to out-of-scope variables in put_cmsg_compat()
  net: ethernet: davinci_emac: add pm_runtime support
  net: ethernet: davinci_emac: Remove unnecessary #include
  ...
2012-07-24 10:01:50 -07:00
Linus Torvalds
cb47c1831f Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull target updates from Nicholas Bellinger:
 "There have been lots of work in a number of areas this past round.
  The highlights include:

   - Break out target_core_cdb.c emulation into SPC/SBC ops (hch)
   - Add a parse_cdb method to target backend drivers (hch)
   - Move sync_cache + write_same + unmap into spc_ops (hch)
   - Use target_execute_cmd for WRITEs in iscsi_target + srpt (hch)
   - Offload WRITE I/O backend submission in tcm_qla2xxx + tcm_fc (hch +
     nab)
   - Refactor core_update_device_list_for_node() into enable/disable
     funcs (agrover)
   - Replace the TCM processing thread with a TMR work queue (hch)
   - Fix regression in transport_add_device_to_core_hba from TMR
     conversion (DanC)
   - Remove racy, now-redundant check of sess_tearing_down with qla2xxx
     (roland)
   - Add range checking, fix reading of data len + possible underflow in
     UNMAP (roland)
   - Allow for target_submit_cmd() returning errors + convert fabrics
     (roland + nab)
   - Drop bogus struct file usage for iSCSI/SCTP (viro)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (54 commits)
  iscsi-target: Drop bogus struct file usage for iSCSI/SCTP
  target: NULL dereference on error path
  target: Allow for target_submit_cmd() returning errors
  target: Check number of unmap descriptors against our limit
  target: Fix possible integer underflow in UNMAP emulation
  target: Fix reading of data length fields for UNMAP commands
  target: Add range checking to UNMAP emulation
  target: Add generation of LOGICAL BLOCK ADDRESS OUT OF RANGE
  target: Make unnecessarily global se_dev_align_max_sectors() static
  target: Remove se_session.sess_wait_list
  qla2xxx: Remove racy, now-redundant check of sess_tearing_down
  target: Check sess_tearing_down in target_get_sess_cmd()
  sbp-target: Consolidate duplicated error path code in sbp_handle_command()
  target: Un-export target_get_sess_cmd()
  qla2xxx: Get rid of redundant qla_tgt_sess.tearing_down
  target: Make core_disable_device_list_for_node use pre-refactoring lock ordering
  target: refactor core_update_device_list_for_node()
  target: Eliminate else using boolean logic
  target: Misc retval cleanups
  target: Remove hba param from core_dev_add_lun
  ...
2012-07-22 13:31:57 -07:00
David S. Miller
6700c2709c net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()
This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more.  In the
future the routes will be without destination address, source address,
etc. keying.  One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information.  This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here.  Using that flow key will do a fib_lookup()
and create/update the persistent entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17 03:29:28 -07:00
Christoph Hellwig
e672a47fd9 srpt: use target_execute_cmd for WRITEs in srpt_handle_rdma_comp
srpt_handle_rdma_comp is called from kthread context and thus can execute
target_execute_cmd directly.  srpt_abort_cmd sets the CMD_T_LUN_STOP
flag directly, and thus the abuse of transport_generic_handle_data can be
replaced with an opencoded variant of that code path.  I'm still not happy
about a fabric driver poking into target core internals like this, but
let's defer the bigger architecture changes for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-07-16 17:35:18 -07:00
David S. Miller
04c9f416e3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/batman-adv/bridge_loop_avoidance.c
	net/batman-adv/bridge_loop_avoidance.h
	net/batman-adv/soft-interface.c
	net/mac80211/mlme.c

With merge help from Antonio Quartulli (batman-adv) and
Stephen Rothwell (drivers/net/usb/qmi_wwan.c).

The net/mac80211/mlme.c conflict seemed easy enough, accounting for a
conversion to some new tracing macros.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:56:33 -07:00
Eric Dumazet
b28ba72665 IPoIB: fix skb truesize underestimatiom
Or Gerlitz reported triggering of WARN_ON_ONCE(delta < len); in
skb_try_coalesce()
This warning tracks drivers that incorrectly set skb->truesize

IPoIB indeed allocates a full page to store a fragment, but only
accounts in skb->truesize the used part of the page (frame length)

This patch fixes skb truesize underestimation, and
also fixes a performance issue, because RX skbs have not enough tailroom
to allow IP and TCP stacks to pull their header in skb linear part
without an expensive call to pskb_expand_head()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Shlomo Pongartz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:33:12 -07:00
Roland Dreier
d90f9b3591 IB: Use IS_ENABLED(CONFIG_IPV6)
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:04:32 -07:00
David S. Miller
700db99d01 ipoib: Need to do dst_neigh_lookup_skb() outside of priv->lock.
Otherwise local_bh_enable() complains.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 21:08:05 -07:00
David S. Miller
178709bbfe ipoib: Convert over to dev_lookup_neigh_skb().
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 01:09:36 -07:00
Linus Torvalds
c23ddf7857 InfiniBand/RDMA changes for the 3.5 merge window:
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
  - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
    applications to send and receive arbitrary packets directly to/from
    the hardware
  - Add "doorbell drop" handling to the cxgb4 driver
  - A fairly large batch of qib hardware driver changes
  - A few fixes for lockdep-detected issues
  - A few other miscellaneous fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPumbZAAoJEENa44ZhAt0h8LgP/0fXe7Szm3n6P6UvMAVqkagM
 4PpreH3mpWUFpzqeQE1JPDtgx700R6aPipbHqgIN+k61RWMpLjICGcNx7iwxn1I+
 zqdquGygWgjceLz+BLVlk+iBmJt3vZ3fPRAXc7fdP+jhIarWkNIOy1pXWTUuRvED
 jL8jIaxhCcgAVzm/zNyt6IPxkaHvCz7K9wqmpyU0dsO9OyPdGvWA9+CkGXwmOCPq
 mxSVhWnfGsMkPBsL7EgTC5KP/ox2PKq6rFgysmVVS+rKCpP0L8BEVQyGX3Gf8KA8
 yV+KdTi9ofDnFrv6R7Wz0v7HRUih8GRssakzBu7Y7HLfK1M/QwMG0GUAibXGZObc
 vUXuQ3uRJ/cIzMPXqKeGYwpb5t+TmxyjhWu44OjCUQkNau91+9BSbA69S88KXc49
 aTJiCZlhPuGf4uGMWJJuPLcE2xO2QCZj+8ckL2STYrIip6GWlCH02kJaQmRkuWH2
 UhMOeJDBC4nvh4EQT/WwHpGzyhkavE2ayfo5YemxBJXo+P5Mdbf7WIDRQDLUEeQH
 F8sPoccH4hDiAorN/SkTsm14jVTP7oWW1M40Ont59Nhbgm88MsVkvjoneHnfBvbD
 HjK92soCWnYTAoREfj0G4xUxZgMdOZcezWrX0rx5LJ8Ju9y4zAi3cKGr7lg6hs4X
 syKfN0VjiDRtJ+pxayi3
 =yWfr
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
 - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
   applications to send and receive arbitrary packets directly to/from
   the hardware
 - Add "doorbell drop" handling to the cxgb4 driver
 - A fairly large batch of qib hardware driver changes
 - A few fixes for lockdep-detected issues
 - A few other miscellaneous fixes and cleanups

Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h.

* tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits)
  RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree
  IB/mlx4: Fix mlx4_ib_add() error flow
  IB/core: Fix IB_SA_COMP_MASK macro
  IB/iser: Fix error flow in iser ep connection establishment
  IB/mlx4: Increase the number of vectors (EQs) available for ULPs
  RDMA/cxgb4: Add query_qp support
  RDMA/cxgb4: Remove kfifo usage
  RDMA/cxgb4: Use vmalloc() for debugfs QP dump
  RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues
  RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch()
  RDMA/cxgb4: Add DB Overflow Avoidance
  RDMA/cxgb4: Add debugfs RDMA memory stats
  cxgb4: DB Drop Recovery for RDMA and LLD queues
  cxgb4: Common platform specific changes for DB Drop Recovery
  cxgb4: Detect DB FULL events and notify RDMA ULD
  RDMA/cxgb4: Drop peer_abort when no endpoint found
  RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr()
  mlx4_core: Change bitmap allocator to work in round-robin fashion
  RDMA/nes: Don't call event handler if pointer is NULL
  RDMA/nes: Fix for the ORD value of the connecting peer
  ...
2012-05-21 17:54:55 -07:00
Linus Torvalds
c9bfa7d75b Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull scsi-target changes from Nicholas Bellinger:
 "There has been lots of work in existing code in a number of areas this
  past cycle.  The major highlights have been:

   * Removal of transport_do_task_sg_chain() from core + fabrics
     (Roland)
   * target-core: Removal of se_task abstraction from target-core and
     enforce hw_max_sectors for pSCSI backends (hch)
   * Re-factoring of iscsi-target tx immediate/response queues (agrover)
   * Conversion of iscsi-target back to using target core memory
     allocation logic (agrover)

  We've had one last minute iscsi-target patch go into for-next to
  address a nasty regression bug related to the target core allocation
  logic conversion from agrover that is not included in friday's
  linux-next build, but has been included in this series.

  On the new fabric module code front for-3.5, here is a brief status
  update for the three currently in flight this round:

   * usb-gadget target driver:

  Sebastian Siewior's driver for supporting usb-gadget target mode
  operation.  This will be going out as a separate PULL request from
  target-pending/usb-target-merge with subsystem maintainer ACKs.  There
  is one minor target-core patch in this series required to function.

   * sbp ieee-1394/firewire target driver:

  Chris Boot's driver for supportting the Serial Block Protocol (SBP)
  across IEEE-1394 Firewire hardware.  This will be going out as a
  separate PULL request from target-pending/sbp-target-merge with two
  additional drivers/firewire/ patches w/ subsystem maintainer ACKs.

   * qla2xxx LLD target mode infrastructure changes + tcm_qla2xxx:

  The Qlogic >= 24xx series HW target mode LLD infrastructure patch-set
  and tcm_qla2xxx fabric driver.  Support for FC target mode using
  qla2xxx LLD code has been officially submitted by Qlogic to James
  below, and is currently outstanding but not yet merged into
  scsi.git/for-next..

    [PATCH 00/22] qla2xxx: Updates for scsi "misc" branch
    http://www.spinics.net/lists/linux-scsi/msg59350.html

  Note there are *zero* direct dependencies upon this for-next series
  for the qla2xxx LLD target + tcm_qla2xxx patches submitted above, and
  over the last days the target mode team has been tracking down an
  tcm_qla2xxx specific active I/O shutdown bug that appears to now be
  almost squashed for 3.5-rc-fixes."

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (47 commits)
  iscsi-target: Fix iov_count calculation bug in iscsit_allocate_iovecs
  iscsi-target: remove dead code in iscsi_check_valuelist_for_support
  target: Handle ATA_16 passthrough for pSCSI backend devices
  target: Add MI_REPORT_TARGET_PGS ext. header + implict_trans_secs attribute
  target: Fix MAINTENANCE_IN service action CDB checks to use lower 5 bits
  target: add support for the WRITE_VERIFY command
  target: make target_put_session void
  target: cleanup transport_execute_tasks()
  target: Remove max_sectors device attribute for modern se_task less code
  target: lock => unlock typo in transport_lun_wait_for_tasks
  target: Enforce hw_max_sectors for SCF_SCSI_DATA_SG_IO_CDB
  target: remove the t_se_count field in struct se_cmd
  target: remove the t_task_cdbs_ex_left field in struct se_cmd
  target: remove the t_task_cdbs_left field in struct se_cmd
  target: remove struct se_task
  target: move the state and execute lists to the command
  target: simplify command to task linkage
  target: always allocate a single task
  target: replace ->execute_task with ->execute_cmd
  target: remove the task_sectors field in struct se_task
  ...
2012-05-21 17:37:09 -07:00
Or Gerlitz
7d9c0de4ab IB/iser: Fix error flow in iser ep connection establishment
The current error flow code was releasing the IB connection object and
calling iscsi_destroy_endpoint() directly without going through the
reference counting mechanism introduced in commit 39ff05d ("IB/iser:
Enhance disconnection logic for multi-pathing"). This resulted in a
double free of the iscsi endpoint object, which causes a kernel NULL
pointer dereference.  Fix that by plugging into the IB conn reference
counting correctly.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-18 17:05:31 -07:00
Andy Grover
a12f41f841 target: Rename target_allocate_tasks to target_setup_cmd_from_cdb
This patch renames a horribly misnamed function that no longer allocate
tasks to something more descriptive for it's modern use in target core.

(nab: Fix up ib_srpt to use this as well ahead of a target_submit_cmd
conversion)

Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-04-14 17:40:36 -07:00
Roland Dreier
6f9e7f01b6 IB/srpt: Remove use of transport_do_task_sg_chain()
With the modern target core, se_cmd->t_data_sg already points to a
sglist that covers the whole command.  So task_sg chaining is needless
overhead and obfuscation -- instead of splicing the split up task
sglists back into one list, we can just use the original list directly.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-04-14 17:40:32 -07:00
Roland Dreier
6f3603367b IB/srpt: Set srq_type to IB_SRQT_BASIC
Since commit 96104eda01 ("RDMA/core: Add SRQ type field"), kernel
users of SRQs need to specify srq_type = IB_SRQT_BASIC in struct
ib_srq_init_attr, or else most low-level drivers will fail in
when srpt_add_one() calls ib_create_srq() and gets -ENOSYS.

(mlx4_ib works OK nearly all of the time, because it just needs
srq_type != IB_SRQT_XRC.  And apparently nearly everyone using
ib_srpt is using mlx4 hardware)

Reported-by: Alexey Shvetsov <alexxy@gentoo.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-12 07:56:57 -07:00
Linus Torvalds
1ab142d499 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "This contains the usual set of updates and bugfixes to target-core +
  existing fabric module code, along with a handful of the patches
  destined for v3.3 stable.

  It also contains the necessary target-core infrastructure pieces
  required to run using tcm_qla2xxx.ko WWPNs with the new Qlogic Fibre
  Channel fabric module currently queued in target-pending/for-next-merge,
  and coming for round 2.

  The highlights for this series include:

   - Add target_submit_tmr() helper function for fabric task management
     (andy)
   - Convert tcm_fc to use target_submit_tmr() (andy)
   - Replace target core various cmd flags with a transport state (hch)
   - Convert loopback to use workqueue submission (hch)
   - Convert target core to use array_zalloc for tpg_lun_list (joern)
   - Convert target core to use array_zalloc for device_list (joern)
   - Add target core support for TMR_ABORT_TASK (nab)
   - Add target core se_sess->sess_kref + get/put helpers (nab)
   - Add target core se_node_acl->acl_kref for ->acl_free_comp usage
     (nab)
   - Convert iscsi-target to use target_put_session + sess_kref (nab)
   - Fix tcm_fc fc_exch memory leak in ft_send_resp_status (nab)
   - Fix ib_srpt srpt_handle_cmd send_ioctx->ioctx_kref leak on
     exception (nab)
   - Fix target core up handling of short INQUIRY buffers (roland)
   - Untangle target-core front-end and back-end meanings of max_sectors
     attribute (roland)
   - Set loopback residual field for SCSI commands (roland)
   - Fix target-core 16-bit target ports for SET TARGET PORT GROUPS
     emulation (roland)

  Thanks again to Andy, Christoph, Joern, Roland, and everyone who has
  contributed this round!"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (64 commits)
  ib_srpt: Fix srpt_handle_cmd send_ioctx->ioctx_kref leak on exception
  loopback: Fix transport_generic_allocate_tasks error handling
  iscsi-target: remove improper externs
  iscsi-target: Remove unused variables in iscsi_target_parameters.c
  target: remove obvious warnings
  target: Use array_zalloc for device_list
  target: Use array_zalloc for tpg_lun_list
  target: Fix sense code for unsupported SERVICE ACTION IN
  target: Remove hack to make READ CAPACITY(10) lie if thin provisioning is enabled
  target: Bump core version to v4.1.0-rc2-ml + fabric versions
  tcm_fc: Fix fc_exch memory leak in ft_send_resp_status
  target: Drop unused legacy target_core_fabric_ops API callers
  iscsi-target: Convert to use target_put_session + sess_kref
  target: Convert se_node_acl->acl_group removal to use ->acl_kref
  target: Add se_node_acl->acl_kref for ->acl_free_comp usage
  target: Add se_node_acl->acl_free_comp for NodeACL release path
  target: Add se_sess->sess_kref + get/put helpers
  target: Convert session_lock to irqsave
  target: Fix typo in drivers/target
  iscsi-target: Fix dynamic -> explict NodeACL pointer reference
  ...
2012-03-22 12:38:04 -07:00
Linus Torvalds
0c2fe82a9b InfiniBand/RDMA changes for the 3.4 merge window. Nothing big really
stands out; by patch count lots of fixes to the mlx4 driver plus some
 cleanups and fixes to the core and other drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPZ2THAAoJEENa44ZhAt0hMgkP/AwXjwzz6lYlIizytQ5KOH11
 CT/VoHLIGIwY31aRhZRHggWLEbJi8akgfh04K9PiSh/muFRPYk5KJDoQEoJV5Odv
 12krqtpZeL41YcAPq0wiyP3Ki5AZDCDumzWbOtmxE9m93mVfi3yFTTWDHFo/7SOx
 A2kUITdKuZhgBpCFYS23Gs8QTWcMPUlwNlM76edG90BjSySHsK15tbqTUaEOC6Ux
 SPG0p0c2YjlZCPmRObrCL65o5LFRVajinZ4rXN7rre4LEU+IuXgW+mQU2Kwy8z6a
 oMPE2l4OMSqj270r2PlVK087gGH69nKfLgLQLJiP0Id+KwdYUk7vQcA+Fu0jwjP3
 Ci/ahllvB76IiaNbax0vTy2ohCJmilLLotAP46NW0vPR2/KnmVy0Mqk8pXQQddw4
 tnAFNnafn/MjTty8GwqyXR/enJZs29ePcSxcYnKjXYRIgG4ldejL2wktgSra8+gb
 3/qFag1HRgZzcICkXGpHj7uWJ2KDe++1m6KTb0THUmrjuiel6ATFZpYoeRed3GfS
 ZMqpjZvuXM8nA9CrKMkzm5vIiUFF8Qq0hUTLR38F8FiNEhmC08JqH5PqjJ3Z18if
 R1yG4W4xmp/0ICHg4zyg6LWV3YsweDD+jSCJpW+KNBvu3HtYb41HIlK+i2TTmtPD
 3etEZZRwROAyYKcwN01p
 =DbLx
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes for the 3.4 merge window from Roland Dreier:
 "Nothing big really stands out; by patch count lots of fixes to the
  mlx4 driver plus some cleanups and fixes to the core and other
  drivers."

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (28 commits)
  mlx4_core: Scale size of MTT table with system RAM
  mlx4_core: Allow dynamic MTU configuration for IB ports
  IB/mlx4: Fix info returned when querying IBoE ports
  IB/mlx4: Fix possible missed completion event
  mlx4_core: Report thermal error events
  mlx4_core: Fix one more static exported function
  IB: Change CQE "csum_ok" field to a bit flag
  RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state
  RDMA/cxgb3: Don't pass irq flags to flush_qp()
  mlx4_core: Get rid of redundant ext_port_cap flags
  RDMA/ucma: Fix AB-BA deadlock
  IB/ehca: Fix ilog2() compile failure
  IB: Use central enum for speed instead of hard-coded values
  IB/iser: Post initial receive buffers before sending the final login request
  IB/iser: Free IB connection resources in the proper place
  IB/srp: Consolidate repetitive sysfs code
  IB/srp: Use pr_fmt() and pr_err()/pr_warn()
  IB/core: Fix SDR rates in sysfs
  mlx4: Enforce device max FMR maps in FMR alloc
  IB/mlx4: Set bad_wr for invalid send opcode
  ...
2012-03-21 10:33:42 -07:00
Linus Torvalds
9f3938346a Merge branch 'kmap_atomic' of git://github.com/congwang/linux
Pull kmap_atomic cleanup from Cong Wang.

It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().

Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.

* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
  feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
  highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
  drbd: remove the second argument of k[un]map_atomic()
  zcache: remove the second argument of k[un]map_atomic()
  gma500: remove the second argument of k[un]map_atomic()
  dm: remove the second argument of k[un]map_atomic()
  tomoyo: remove the second argument of k[un]map_atomic()
  sunrpc: remove the second argument of k[un]map_atomic()
  rds: remove the second argument of k[un]map_atomic()
  net: remove the second argument of k[un]map_atomic()
  mm: remove the second argument of k[un]map_atomic()
  lib: remove the second argument of k[un]map_atomic()
  power: remove the second argument of k[un]map_atomic()
  kdb: remove the second argument of k[un]map_atomic()
  udf: remove the second argument of k[un]map_atomic()
  ubifs: remove the second argument of k[un]map_atomic()
  squashfs: remove the second argument of k[un]map_atomic()
  reiserfs: remove the second argument of k[un]map_atomic()
  ocfs2: remove the second argument of k[un]map_atomic()
  ntfs: remove the second argument of k[un]map_atomic()
  ...
2012-03-21 09:40:26 -07:00
Linus Torvalds
69a7aebcf0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "It's indeed trivial -- mostly documentation updates and a bunch of
  typo fixes from Masanari.

  There are also several linux/version.h include removals from Jesper."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
  kcore: fix spelling in read_kcore() comment
  constify struct pci_dev * in obvious cases
  Revert "char: Fix typo in viotape.c"
  init: fix wording error in mm_init comment
  usb: gadget: Kconfig: fix typo for 'different'
  Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
  writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
  writeback: fix typo in the writeback_control comment
  Documentation: Fix multiple typo in Documentation
  tpm_tis: fix tis_lock with respect to RCU
  Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
  Doc: Update numastat.txt
  qla4xxx: Add missing spaces to error messages
  compiler.h: Fix typo
  security: struct security_operations kerneldoc fix
  Documentation: broken URL in libata.tmpl
  Documentation: broken URL in filesystems.tmpl
  mtd: simplify return logic in do_map_probe()
  mm: fix comment typo of truncate_inode_pages_range
  power: bq27x00: Fix typos in comment
  ...
2012-03-20 21:12:50 -07:00
Cong Wang
2a156d094d infiniband: remove the second argument of k[un]map_atomic()
Acked-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:18 +08:00
Roland Dreier
f0e88aeb19 Merge branches 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iser', 'mad', 'nes', 'qib', 'srp' and 'srpt' into for-next 2012-03-19 09:50:33 -07:00
Nicholas Bellinger
187e70a554 ib_srpt: Fix srpt_handle_cmd send_ioctx->ioctx_kref leak on exception
This patch addresses a bug in srpt_handle_cmd() failure handling where
send_ioctx->kref is being leaked with the local extra reference after init,
causing the expected kref_put() in srpt_handle_send_comp() to not be the final
call to invoke srpt_put_send_ioctx_kref() -> transport_generic_free_cmd() and
perform se_cmd descriptor memory release.

It also fixes a SCF_SCSI_RESERVATION_CONFLICT handling bug where this code
is incorrectly falling through to transport_handle_cdb_direct() after
invoking srpt_queue_status() to send SAM_STAT_RESERVATION_CONFLICT status.

Note this patch is for >= v3.3 mainline code, and current lio-core.git
code has already been converted to target_submit_cmd() + se_cmd->cmd_kref usage,
and internal ioctx->kref usage has been removed.  I'm including this patch
now into target-pending/for-next with a CC' for v3.3 stable.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-03-17 22:13:48 -07:00
Nicholas Bellinger
c7ec05c82b target: Drop unused legacy target_core_fabric_ops API callers
This patch drops the following unused legacy API callers from target_core_fabric.h:

*) TFO->fall_back_to_erl0()
*) TFO->stop_session()
*) TFO->sess_logged_in()
*) TFO->is_state_remove()

This patch also removes the stub usage in loopback, tcm_fc, iscsi_target,
and ib_srpt fabric modules.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-03-10 14:42:55 -08:00
Or Gerlitz
d927d505c5 IB: Change CQE "csum_ok" field to a bit flag
Use a bit in wc_flags rather then a whole integer to hold the
"checksum OK" flag.  By itself, this change doesn't reduce the size of
struct ib_wc on 64bit machines -- it stays on 56 bytes because of
padding.  However, it will allow to add more fields in the future
without enlarging the struct.  Also, it will let us have a unified
approach with future libibverbs checksum offload reporting, because a
bit flag doesn't break the library ABI.

This patch was suggested during conversation with Liran Liss
<liranl@mellanox.com>.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-08 12:34:27 -08:00
Or Gerlitz
89e984e2c2 IB/iser: Post initial receive buffers before sending the final login request
An iser target may send iscsi NO-OP PDUs as soon as it marks the iSER
iSCSI session as fully operative.  This means that there is window
where there are no posted receive buffers on the initiator side, so
it's possible for the iSER RC connection to break because of RNR NAK /
retry errors.  To fix this, rely on the flags bits in the login
request to have FFP (0x3) in the lower nibble as a marker for the
final login request, and post an initial chunk of receive buffers
before sending that login request instead of after getting the login
response.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-05 08:53:05 -08:00
Doug Ledford
d474186f19 IB/iser: Free IB connection resources in the proper place
We allocate the login dma buffers in iser_verbs.c as part of
alloc_ib_conn_resources(), however we are freeing them in
iser_initiator.c as part of iser_free_rx_descriptors().  This is
needlessly confusing.  We have an alloc_rx_descriptors() and it
doesn't alloc something that the free_rx_descriptors() frees, and we
have an alloc_ib_conn_resources() that allocs something not freed by
free_ib_conn_resources().  Clean that up.

Signed-off-by: Doug Ledford <dledford@redhat.com>

[ Fix build error in iser_free_ib_conn_res().  - Or ]

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-05 00:23:27 -08:00
Bart Van Assche
683b159a2e IB/srp: Consolidate repetitive sysfs code
Remove sysfs attributes before removing a target instead of testing
the target state in every sysfs attribute callback method. Note: it is
safe to invoke a sysfs attribute removal method like
device_remove_file() twice on the same attribute.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-27 09:27:57 -08:00
Bart Van Assche
e0bda7d8c3 IB/srp: Use pr_fmt() and pr_err()/pr_warn()
Use pr_fmt() and pr_xxx() instead of more verbose printk() equivalents.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-27 09:26:30 -08:00
Masanari Iida
a776ce7cfc IB/srpt: Fix typo "alocate" -> "allocate"
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-25 17:49:37 -08:00
Andy Grover
c8e31f26fe target: Add SCF_SCSI_TMR_CDB usage and drop se_tmr_req_cache
Change the test for if a cmd is a tmr request to checking if
SCF_SCSI_TMR_CDB (a new flag) is set in cmd->se_cmd_flags.

Also remove se_tmr_req_cache usage in favor of kzalloc usage,
and make core_tmr_alloc_req() return int + setup se_cmd->se_tmr_req
directly and fix up various fabric module usages

Cc: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-02-25 14:37:47 -08:00
Christoph Hellwig
7d680f3b74 target: replace various cmd flags with a transport state
Replace various atomic_ts used as flags in struct se_cmd with a single
transport_state bitmap that requires t_state_lock to be held for modifications.

In the target core that assumption generally is true, but some recently added
code in the SRP target had to grow new lock calls.  I can't say I like the way
how it messes with the command state directly, but let's leave that for later.

(Re-add missing ib_srpt.c changes that nab dropped..)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-02-25 14:37:45 -08:00
Linus Torvalds
8df54d622a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Quoth David:

1) GRO MAC header comparisons were ethernet specific, breaking other
   link types.  This required a multi-faceted fix to cure the originally
   noted case (Infiniband), because IPoIB was lying about it's actual
   hard header length.  Thanks to Eric Dumazet, Roland Dreier, and
   others.

2) Fix build failure when INET_UDP_DIAG is built in and ipv6 is modular.
   From Anisse Astier.

3) Off by ones and other bug fixes in netprio_cgroup from Neil Horman.

4) ipv4 TCP reset generation needs to respect any network interface
   binding from the socket, otherwise route lookups might give a
   different result than all the other segments received.  From Shawn
   Lu.

5) Fix unintended regression in ipv4 proxy ARP responses, from Thomas
   Graf.

6) Fix SKB under-allocation bug in sh_eth, from Yoshihiro Shimoda.

7) Revert skge PCI mapping changes that are causing crashes for some
   folks, from Stephen Hemminger.

8) IPV4 route lookups fill in the wildcarded fields of the given flow
   lookup key passed in, which is fine most of the time as this is
   exactly what the caller's want.  However there are a few cases that
   want to retain the original flow key values afterwards, so handle
   those cases properly.  Fix from Julian Anastasov.

9) IGB/IXGBE VF lookup bug fixes from Greg Rose.

10) Properly null terminate filename passed to ethtool flash device
    method, from Ben Hutchings.

11) S3 resume fix in via-velocity from David Lv.

12) Fix double SKB free during xmit failure in CAIF, from Dmitry
    Tarnyagin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
  net: Don't proxy arp respond if iif == rt->dst.dev if private VLAN is disabled
  ipv4: Fix wrong order of ip_rt_get_source() and update iph->daddr.
  netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=m
  netprio_cgroup: don't allocate prio table when a device is registered
  netprio_cgroup: fix an off-by-one bug
  bna: fix error handling of bnad_get_flash_partition_by_offset()
  isdn: type bug in isdn_net_header()
  net: Make qdisc_skb_cb upper size bound explicit.
  ixgbe: ethtool: stats user buffer overrun
  ixgbe: dcb: up2tc mapping lost on disable/enable CEE DCB state
  ixgbe: do not update real num queues when netdev is going away
  ixgbe: Fix broken dependency on MAX_SKB_FRAGS being related to page size
  ixgbe: Fix case of Tx Hang in PF with 32 VFs
  ixgbe: fix vf lookup
  igb: fix vf lookup
  e1000: add dropped DMA receive enable back in for WoL
  gro: more generic L2 header check
  IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses
  zd1211rw: firmware needs duration_id set to zero for non-pspoll frames
  net: enable TC35815 for MIPS again
  ...
2012-02-10 14:18:46 -08:00
Masanari Iida
7367d99b30 SRP: Fix typo in ib_srpt.c
Correct spelling "alocate" to "allocate" in
drivers/infiniband/ulp/srpt/ib_srpt.c

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-02-09 23:09:54 +01:00
Roland Dreier
936d7de3d7 IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses
Commit a0417fa3a1 ("net: Make qdisc_skb_cb upper size bound
explicit.") made it possible for a netdev driver to use skb->cb
between its header_ops.create method and its .ndo_start_xmit
method.  Use this in ipoib_hard_header() to stash away the LL address
(GID + QPN), instead of the "ipoib_pseudoheader" hack.  This allows
IPoIB to stop lying about its hard_header_len, which will let us fix
the L2 check for GRO.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-08 18:26:54 -05:00
Jesper Juhl
715252d419 IB/srpt: Don't return freed pointer from srpt_alloc_ioctx_ring()
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-06 08:57:11 -08:00
Dan Carpenter
3af336376f IB/srpt: Fix ERR_PTR() vs. NULL checking confusion
transport_init_session() and target_fabric_configfs_init() don't
return NULL pointers, they only return ERR_PTRs or valid pointers.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-03 09:07:00 -08:00
Jesper Juhl
ebfded8c4b IB/srpt: Remove unneeded <linux/version.h> include
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-02 12:55:59 -08:00
Roland Dreier
f225066b64 IB/srpt: Use ARRAY_SIZE() instead of open-coding
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-02 12:55:58 -08:00
Roland Dreier
486d8b9f88 IB/srpt: Use DEFINE_SPINLOCK()/LIST_HEAD()
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-02 12:55:58 -08:00
Linus Torvalds
f59e842fc0 Merge branch 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
* 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  ib_srpt: Initial SRP Target merge for v3.3-rc1
2012-01-18 16:29:42 -08:00
Linus Torvalds
972b2c7199 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
  reiserfs: Properly display mount options in /proc/mounts
  vfs: prevent remount read-only if pending removes
  vfs: count unlinked inodes
  vfs: protect remounting superblock read-only
  vfs: keep list of mounts for each superblock
  vfs: switch ->show_options() to struct dentry *
  vfs: switch ->show_path() to struct dentry *
  vfs: switch ->show_devname() to struct dentry *
  vfs: switch ->show_stats to struct dentry *
  switch security_path_chmod() to struct path *
  vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
  vfs: trim includes a bit
  switch mnt_namespace ->root to struct mount
  vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
  vfs: opencode mntget() mnt_set_mountpoint()
  vfs: spread struct mount - remaining argument of next_mnt()
  vfs: move fsnotify junk to struct mount
  vfs: move mnt_devname
  vfs: move mnt_list to struct mount
  vfs: switch pnode.h macros to struct mount *
  ...
2012-01-08 12:19:57 -08:00
Al Viro
587a1f1659 switch ->is_visible() to returning umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:55 -05:00
Bart Van Assche
a42d985bd5 ib_srpt: Initial SRP Target merge for v3.3-rc1
This patch adds the kernel module ib_srpt SCSI RDMA Protocol (SRP) target
implementation conforming to the SRP r16a specification for the mainline
drivers/target infrastructure.

This driver was originally developed by Vu Pham and has been optimized by
Bart Van Assche and merged into upstream LIO based on his srpt-lio-4.1
branch here:

   https://github.com/bvanassche/srpt-lio/commits/srpt-lio-4.1/

This updated patch also contains the following two changes from
lio-core-2.6.git/master.  One is to fix a bug with 1 >= task->task_sg[]
chained mappings in ib_srpt, and the other to convert the configfs control
plane to reference IB Port GUID and struct srpt_port directly following
mainline v4.x target_core_fabric_configfs.c convertion for ib_srpt
to work with rtslib/rtsadmin v2 code.

These seperate patches can be found here:

ib_srpt: Fix bug with chainged SGLs in srpt_map_sg_to_ib_sge
http://www.risingtidesystems.com/git/?p=lio-core-2.6.git;a=commitdiff;h=ea485147563b6555a97dbf811825fbb586519252

ib_srpt: Convert se_wwn endpoint reference to struct srpt_port->port_wwn
http://www.risingtidesystems.com/git/?p=lio-core-2.6.git;a=commitdiff;h=4e544a210acb227df1bb4ca5086e65bdf4e648ea

This also includes the following recent v1 -> v2 review changes:

ib_srpt: Fix potential out-of-bounds array access
ib_srpt: Avoid failed multipart RDMA transfers
ib_srpt: Fix srpt_alloc_fabric_acl failure case return value
ib_srpt: Update comments to reference $driver/$port layout
ib_srpt: Fix sport->port_guid formatting code
ib_srpt: Remove legacy use_port_guid_in_session_name module parameter
ib_srpt: Convert srp_max_rdma_size into per port configfs attribute
ib_srpt: Convert srp_max_rsp_size into per port configfs attribute
ib_srpt: Convert srpt_sq_size into per port configfs attribute

and v2 -> v3 review changes:

ib_srpt: Fix possible race with srp_sq_size in srpt_create_ch_ib
ib_srpt: Fix possible race with srp_max_rsp_size in srpt_release_channel_work
ib_srpt: Fix up MAX_SRPT_RDMA_SIZE define
ib_srpt: Make srpt_map_sg_to_ib_sge() failure case return -EAGAIN
ib_srpt: Convert port_guid to use subnet_prefix + interface_id formatting
ib_srpt: Make srpt_check_stop_free return kref_put status
ib_srpt: Make compilation with BUG=n proceed`
ib_srpt: Use new target_core_fabric.h include
ib_srpt: Check hex2bin() return code to silence build warning

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Vu Pham <vu@mellanox.com>
Cc: David Dillow <dillowda@ornl.gov>
Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>
2011-12-16 06:33:56 +00:00
David Miller
17e6abeec4 infiniband: ipoib: Sanitize neighbour handling in ipoib_main.c
Reduce the number of dst_get_neighbour_noref() calls within a single
call chain.  Primarily by passing the neighbour pointer down to the
helper functions.

Handle dst_get_neighbour_noref() returning NULL in ipoib_start_xmit()
by incrementing the dropped counter and freeing the packet.  We don't
want it to fall through into the ARP/RARP/multicast handling, since
that should only happen when skb_dst() is NULL.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
2011-12-05 15:20:20 -05:00
David Miller
2721745501 net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
2011-12-05 15:20:19 -05:00
David S. Miller
b3613118eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-02 13:49:21 -05:00
David Miller
596b9b68ef neigh: Add infrastructure for allocating device neigh privates.
netdev->neigh_priv_len records the private area length.

This will trigger for neigh_table objects which set tbl->entry_size
to zero, and the first instances of this will be forthcoming.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30 18:46:43 -05:00
Roland Dreier
a493f1a24a Merge branches 'cxgb4', 'ipoib', 'misc' and 'qib' into for-next 2011-11-29 18:01:53 -08:00
Eric Dumazet
580da35a31 IB: Fix RCU lockdep splats
Commit f2c31e32b3 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.

Many thanks to Marc Aurele who provided a nice bug report and feedback.

Reported-by: Marc Aurele La France <tsi@ualberta.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-29 13:37:11 -08:00
Mike Marciniszyn
3874397c0b IB/ipoib: Prevent hung task or softlockup processing multicast response
This following can occur with ipoib when processing a multicast reponse:

    BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982]
    Modules linked in: ...
    CPU 0:
    Modules linked in: ...
    Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5
    RIP: 0010:[<ffffffff814ddb27>]  [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20
    RSP: 0018:ffff8802119ed860  EFLAGS: 00000246
    0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299
    RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246
    RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000
    R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860
    R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003
    FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
    [<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib]
    [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
    [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
    [<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib]
    [<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib]
    [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0
    [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0
    [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0
    [<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib]
    [<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib]
    [<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa]
    [<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad]
    [<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core]
    [<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa]
    [<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa]
    [<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa]
    [<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad]
    [<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad]
    [<ffffffff81088830>] ? worker_thread+0x170/0x2a0
    [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
    [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0
    [<ffffffff8108ddf6>] ? kthread+0x96/0xa0
    [<ffffffff8100c1ca>] ? child_rip+0xa/0x20

Coinciding with stack trace is the following message:

    ib0: ib_address_create failed

The code below in ipoib_mcast_join_finish() will note the above
failure in the address handle but otherwise continue:

                ah = ipoib_create_ah(dev, priv->pd, &av);
                if (!ah) {
                        ipoib_warn(priv, "ib_address_create failed\n");
                } else {

The while loop at the bottom of ipoib_mcast_join_finish() will attempt
to send queued multicast packets in mcast->pkt_queue and eventually
end up in ipoib_mcast_send():

        if (!mcast->ah) {
                if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE)
                        skb_queue_tail(&mcast->pkt_queue, skb);
                else {
                        ++dev->stats.tx_dropped;
                        dev_kfree_skb_any(skb);
                }

My read is that the code will requeue the packet and return to the
ipoib_mcast_join_finish() while loop and the stage is set for the
"hung" task diagnostic as the while loop never sees a non-NULL ah, and
will do nothing to resolve.

There are GFP_ATOMIC allocates in the provider routines, so this is
possible and should be dealt with.

The test that induced the failure is associated with a host SM on the
same server during a shutdown.

This patch causes ipoib_mcast_join_finish() to exit with an error
which will flush the queued mcast packets.  Nothing is done to unwind
the QP attached state so that subsequent sends from above will retry
the join.

Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Reviewed-by: Gary Leshner <gary.leshner@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-29 13:20:02 -08:00
David S. Miller
9ca36f7db2 infiniband: Update net drivers for netdev_features_t changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-16 18:05:50 -05:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Or Gerlitz
52439540ea IB/iser: DMA unmap TX bufs used for iSCSI/iSER headers
The current driver never does DMA unmapping on these buffers.  Fix that
by adding DMA unmapping to the task cleanup callback, and DMA mapping to
the task init function (drop the headers_initialized micro-optimization).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-04 09:32:44 -07:00
Or Gerlitz
2c4ce60934 IB/iser: Use separate buffers for the login request/response
The driver counted on the transactional nature of iSCSI login/text
flows and used the same buffer for both the request and the response.
We also went further and did DMA mapping only once, with
DMA_FROM_DEVICE, which violates the DMA mapping API.  Fix that by
using different buffers, one for requests and one for responses, and
use the correct DMA mapping direction for each.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-04 09:30:52 -07:00
Linus Torvalds
f470f8d4e7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
  mlx4_core: Deprecate log_num_vlan module param
  IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
  IB/mlx4: Enable 4K mtu for IBoE
  RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
  RDMA/cxgb4: Serialize calls to CQ's comp_handler
  RDMA/cxgb3: Serialize calls to CQ's comp_handler
  IB/qib: Fix issue with link states and QSFP cables
  IB/mlx4: Configure extended active speeds
  mlx4_core: Add extended port capabilities support
  IB/qib: Hold links until tuning data is available
  IB/qib: Clean up checkpatch issue
  IB/qib: Remove s_lock around header validation
  IB/qib: Precompute timeout jiffies to optimize latency
  IB/qib: Use RCU for qpn lookup
  IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
  IB/qib: Decode path MTU optimization
  IB/qib: Optimize RC/UC code by IB operation
  IPoIB: Use the right function to do DMA unmap pages
  RDMA/cxgb4: Use correct QID in insert_recv_cqe()
  RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
  ...
2011-11-01 10:51:38 -07:00
Roland Dreier
504255f8d0 Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next 2011-11-01 09:37:08 -07:00
Paul Gortmaker
fec14d2fce infiniband: add moduleparam.h to drivers/infiniband as required
These files were getting the moduleparam infrastructure from the
implicit presence of module.h being everywhere, but that is going
away soon.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:36 -04:00
Paul Gortmaker
b108d9764c infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULE
These were getting it implicitly via device.h --> module.h but
we are going to stop that when we clean up the headers.

Fix these in advance so the tree remains biscect-clean.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:35 -04:00
Paul Gortmaker
e4dd23d753 infiniband: Fix up module files that need to include module.h
They had been getting it implicitly via device.h but we can't
rely on that for the future, due to a pending cleanup so fix
it now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:35 -04:00
Linus Torvalds
ec7ae51753 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (204 commits)
  [SCSI] qla4xxx: export address/port of connection (fix udev disk names)
  [SCSI] ipr: Fix BUG on adapter dump timeout
  [SCSI] megaraid_sas: Fix instance access in megasas_reset_timer
  [SCSI] hpsa: change confusing message to be more clear
  [SCSI] iscsi class: fix vlan configuration
  [SCSI] qla4xxx: fix data alignment and use nl helpers
  [SCSI] iscsi class: fix link local mispelling
  [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDA
  [SCSI] aacraid: use lower snprintf() limit
  [SCSI] lpfc 8.3.27: Change driver version to 8.3.27
  [SCSI] lpfc 8.3.27: T10 additions for SLI4
  [SCSI] lpfc 8.3.27: Fix queue allocation failure recovery
  [SCSI] lpfc 8.3.27: Change algorithm for getting physical port name
  [SCSI] lpfc 8.3.27: Changed worst case mailbox timeout
  [SCSI] lpfc 8.3.27: Miscellanous logic and interface fixes
  [SCSI] megaraid_sas: Changelog and version update
  [SCSI] megaraid_sas: Add driver workaround for PERC5/1068 kdump kernel panic
  [SCSI] megaraid_sas: Add multiple MSI-X vector/multiple reply queue support
  [SCSI] megaraid_sas: Add support for MegaRAID 9360/9380 12GB/s controllers
  [SCSI] megaraid_sas: Clear FUSION_IN_RESET before enabling interrupts
  ...
2011-10-28 16:44:18 -07:00
Eric Dumazet
9e903e0852 net: add skb frag size accessors
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:10:46 -04:00
Dotan Barak
787adb9d6a IPoIB: Use the right function to do DMA unmap pages
Pages that were mapped using ib_dma_map_page() should be unmapped
using ib_dma_unmap_page().

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-18 10:08:31 -07:00
Sean Hefty
96104eda01 RDMA/core: Add SRQ type field
Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second.  Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:13:26 -07:00
Marcel Apfelbaum
e36fb88a9a IPoIB: Handle extended rates in debugfs
Use new function ib_rate_to_mbps() to handle printing rate in debugfs,
so that we handle extended rates.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-11 11:57:08 -07:00
David S. Miller
8decf86879 Merge branch 'master' of github.com:davem330/net
Conflicts:
	MAINTAINERS
	drivers/net/Kconfig
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
	drivers/net/ethernet/broadcom/tg3.c
	drivers/net/wireless/iwlwifi/iwl-pci.c
	drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
	drivers/net/wireless/rt2x00/rt2800usb.c
	drivers/net/wireless/wl12xx/main.c
2011-09-22 03:23:13 -04:00
Mike Christie
f27fb2ef7b [SCSI] iscsi class: sysfs group is_visible callout for iscsi host attrs
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and driver's host attrs to use the attribute
container sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:14 -06:00
Mike Christie
1d063c1729 [SCSI] iscsi class: sysfs group is_visible callout for session attrs
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and driver's session attrs to use the attribute
container sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:06 -06:00
Mike Christie
3128c6c73c [SCSI] iscsi cls: sysfs group is_visible callout for conn attrs
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and drivers to use the attribute container
sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:03 -06:00
Ian Campbell
5581be3b48 IPoIB: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-26 12:38:42 -04:00
Jiri Pirko
afc4b13df1 net: remove use of ndo_set_multicast_list in drivers
replace it by ndo_set_rx_mode

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:22:03 -07:00
Roland Dreier
80b43de837 Merge branches 'ipoib' and 'iser' into for-next 2011-08-17 10:57:43 -07:00
Or Gerlitz
200ae1a08b IB/iser: Support iSCSI PDU padding
RFC3270 mandates that iSCSI PDUs are padded to the closest integer
number of four byte words.  Fix the iser code to support that on both
the TX/RX flows.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-08-17 09:45:07 -07:00
Or Gerlitz
0ace64b85e IBiser: Fix wrong mask when sizeof (dma_addr_t) > sizeof (unsigned long)
The code that prepares the SG associated with SCSI command for FMR was
buggy for systems with DMA addresses that don't fit in unsigned long,
e.g under the 32-bit based XenServer dom0 sizeof(dma_addr_t) is 8.

Fix that by casting to unsigned long long a masking constant used by
the code. This resolves a crash in iser_sg_to_page_vec on this system.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-08-17 09:40:55 -07:00
Bernd Schubert
22cfb0bf67 IPoIB: Fix possible NULL dereference in ipoib_start_xmit()
Fix a bug introduced in 69cce1d140 ("net: Abstract dst->neighbour
accesses behind helpers.") where we might dereference skb_dst(skb)
even if it is NULL, which causes:

    [  240.944030] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
    [  240.948007] IP: [<ffffffffa0366ce9>] ipoib_start_xmit+0x39/0x280 [ib_ipoib]
    [...]
    [  240.948007] Call Trace:
    [  240.948007]  <IRQ>
    [  240.948007]  [<ffffffff812cd5e0>] dev_hard_start_xmit+0x2a0/0x590
    [  240.948007]  [<ffffffff8131f680>] ? arp_create+0x70/0x200
    [  240.948007]  [<ffffffff812e8e1f>] sch_direct_xmit+0xef/0x1c0

Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=41212
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-08-16 10:19:20 -07:00
Linus Torvalds
91d41fdf31 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Convert to DIV_ROUND_UP_SECTOR_T usage for sectors / dev_max_sectors
  kernel.h: Add DIV_ROUND_UP_ULL and DIV_ROUND_UP_SECTOR_T macro usage
  iscsi-target: Add iSCSI fabric support for target v4.1
  iscsi: Add Serial Number Arithmetic LT and GT into iscsi_proto.h
  iscsi: Use struct scsi_lun in iscsi structs instead of u8[8]
  iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi
2011-07-27 13:21:40 -07:00
Arun Sharma
60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Nicholas Bellinger
123521830c iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi
This patch renames the following iscsi_proto.h structures to avoid
namespace issues with drivers/target/iscsi/iscsi_target_core.h:

*) struct iscsi_cmd -> struct iscsi_scsi_req
*) struct iscsi_cmd_rsp -> struct iscsi_scsi_rsp
*) struct iscsi_login -> struct iscsi_login_req

This patch includes useful ISCSI_FLAG_LOGIN_[CURRENT,NEXT]_STAGE*,
and ISCSI_FLAG_SNACK_TYPE_* definitions used by iscsi_target_mod, and
fixes the incorrect definition of struct iscsi_snack to following
RFC-3720 Section 10.16. SNACK Request.

Also, this patch updates libiscsi, iSER, be2iscsi, and bn2xi to
use the updated structure definitions in a handful of locations.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
2011-07-25 07:18:45 +00:00
Linus Torvalds
ece236ce2f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
  IB/qib: Defer HCA error events to tasklet
  mlx4_core: Bump the driver version to 1.0
  RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit()
  IB/mlx4: Support PMA counters for IBoE
  IB/mlx4: Use flow counters on IBoE ports
  IB/pma: Add include file for IBA performance counters definitions
  mlx4_core: Add network flow counters
  mlx4_core: Fix location of counter index in QP context struct
  mlx4_core: Read extended capabilities into the flags field
  mlx4_core: Extend capability flags to 64 bits
  IB/mlx4: Generate GID change events in IBoE code
  IB/core: Add GID change event
  RDMA/cma: Don't allow IPoIB port space for IBoE
  RDMA: Allow for NULL .modify_device() and .modify_port() methods
  IB/qib: Update active link width
  IB/qib: Fix potential deadlock with link down interrupt
  IB/qib: Add sysfs interface to read free contexts
  IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP
  IB/qib: Remove double define
  IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP
  ...
2011-07-22 14:50:12 -07:00
David S. Miller
69cce1d140 net: Abstract dst->neighbour accesses behind helpers.
dst_{get,set}_neighbour()

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17 23:11:35 -07:00
Bart Van Assche
fd1b6c4a69 IB/srp: Avoid duplicate devices from LUN scan
SCSI scanning of a channel🆔lun triplet in Linux works as follows
(function scsi_scan_target() in drivers/scsi/scsi_scan.c):

- If lun == SCAN_WILD_CARD, send a REPORT LUNS command to the target
  and process the result.

- If lun != SCAN_WILD_CARD, send an INQUIRY command to the LUN
  corresponding to the specified channel🆔lun triplet to verify
  whether the LUN exists.

So a SCSI driver must either take the channel and target id values in
account in its quecommand() function or it should declare that it only
supports one channel and one target id.

Currently the ib_srp driver does neither.  As a result scanning the
SCSI bus via e.g. rescan-scsi-bus.sh causes many duplicate SCSI
devices to be created. For each 0:0:L device, several duplicates are
created with the same LUN number and with (C:I) != (0:0). Fix this by
declaring that the ib_srp driver only supports one channel and one
target id.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@kernel.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-13 09:19:16 -07:00
Alexey Dobriyan
a6b7a40786 net: remove interrupt.h inclusion from netdevice.h
* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-06 22:55:11 -07:00
Linus Torvalds
4c171acc20 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cma: Save PID of ID's owner
  RDMA/cma: Add support for netlink statistics export
  RDMA/cma: Pass QP type into rdma_create_id()
  RDMA: Update exported headers list
  RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
  RDMA/nes: Add a check for strict_strtoul()
  RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
  RDMA/cxgb4: Use completion objects for event blocking
  IB/srp: Fix integer -> pointer cast warnings
  IB: Add devnode methods to cm_class and umad_class
  IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
  IB/uverbs: Add devnode method to set path/mode
  RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
  RDMA: Add netlink infrastructure
  RDMA: Add error handling to ib_core_init()
2011-05-26 12:13:57 -07:00
Roland Dreier
8dc4abdf4c Merge branches 'cma', 'cxgb3', 'cxgb4', 'misc', 'nes', 'netlink', 'srp' and 'uverbs' into for-next 2011-05-25 13:47:20 -07:00
Sean Hefty
b26f9b9949 RDMA/cma: Pass QP type into rdma_create_id()
The RDMA CM currently infers the QP type from the port space selected
by the user.  In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type.  For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.

Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Roland Dreier
737b94eb41 IB/srp: Fix integer -> pointer cast warnings
Fix

    drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_handle_recv':
    drivers/infiniband/ulp/srp/ib_srp.c:1150: warning: cast to pointer from integer of different size
    drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_send_completion':
    drivers/infiniband/ulp/srp/ib_srp.c🔢 warning: cast to pointer from integer of different size

by adding an intermediate cast to uintptr_t.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: David Dillow <dillowda@ornl.gov>
2011-05-23 11:30:04 -07:00
Michał Mirosław
3d96c74d89 net: infiniband/ulp/ipoib: convert to hw_features
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-20 01:30:42 -07:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Linus Torvalds
0625bef606 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB: Increase DMA max_segment_size on Mellanox hardware
  IB/mad: Improve an error message so error code is included
  RDMA/nes: Don't print success message at level KERN_ERR
  RDMA/addr: Fix return of uninitialized ret value
  IB/srp: try to use larger FMR sizes to cover our mappings
  IB/srp: add support for indirect tables that don't fit in SRP_CMD
  IB/srp: rework mapping engine to use multiple FMR entries
  IB/srp: allow sg_tablesize to be set for each target
  IB/srp: move IB CM setup completion into its own function
  IB/srp: always avoid non-zero offsets into an FMR
2011-03-24 07:59:46 -07:00
David Dillow
be8b981453 IB/srp: try to use larger FMR sizes to cover our mappings
Now that we can get larger SG lists, we can take advantage of HCAs that
allow us to use larger FMR sizes. In many cases, we can use up to 512
entries, so start there and work our way down.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:41:30 -04:00
David Dillow
c07d424d61 IB/srp: add support for indirect tables that don't fit in SRP_CMD
This allows us to guarantee the ability to submit up to 8 MB requests
based on the current value of SCSI_MAX_SG_CHAIN_SEGMENTS. While FMR will
usually condense the requests into 8 SG entries, it is imperative that
the target support external tables in case the FMR mapping fails or is
not supported.

We add a safety valve to allow targets without the needed support to
reap the benefits of the large tables, but fail in a manner that lets
the user know that the data didn't make it to the device. The user must
add "allow_ext_sg=1" to the target parameters to indicate that the
target has the needed support.

If indirect_sg_entries is not specified in the modules options, then
the sg_tablesize for the target will default to cmd_sg_entries unless
overridden by the target options.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:37:23 -04:00
David Dillow
8f26c9ff9c IB/srp: rework mapping engine to use multiple FMR entries
Instead of forcing all of the S/G entries to fit in one FMR, and falling
back to indirect descriptors if that fails, allow the use of as many
FMRs as needed to map the request. This lays the groundwork for allowing
indirect descriptor tables that are larger than can fit in the command
IU, but should marginally improve performance now by reducing the number
of indirect descriptors needed.

We increase the minimum page size for the FMR pool to 4K, as larger
pages help increase the coverage of each FMR, and it is rare that the
kernel would send down a request with scattered 512 byte fragments.

This patch also move some of the target initialization code afte the
parsing of options, to keep it together with the new code that needs to
allocate memory based on the options given.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:35:16 -04:00
David Dillow
4924864404 IB/srp: allow sg_tablesize to be set for each target
Different configurations of target software allow differing max sizes of
the command IU. Allowing this to be changed per-target allows all
targets on an initiator to get an optimal setting.

We deprecate srp_sg_tablesize and replace it with cmd_sg_entries in
preparation for allowing more indirect descriptors than can fit in the
IU.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:35:05 -04:00
David Dillow
961e0be89a IB/srp: move IB CM setup completion into its own function
This is to clean up prior to further changes.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:34:48 -04:00
David Dillow
8c4037b501 IB/srp: always avoid non-zero offsets into an FMR
It is unclear exactly how this code works around Mellanox SRP targets,
or if the problem is on the target side or in the HCA itself. In an
abundance of caution, we should always enable the workaround.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:34:28 -04:00
Mike Christie
7c53c6f89d [SCSI] iser: export addr and port
This pactch has iser export the address and port
of the endpoint.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-02-24 12:41:23 -05:00
David Rientjes
6a108a14fa kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.

This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel.  A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).

Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
Roland Dreier
4790f4dc5f Merge branches 'misc', 'mlx4', 'mthca', 'nes' and 'srp' into for-next 2011-01-16 21:22:41 -08:00
Tejun Heo
f06267104d RDMA: Update workqueue usage
* ib_wq is added, which is used as the common workqueue for infiniband
  instead of the system workqueue.  All system workqueue usages
  including flush_scheduled_work() callers are converted to use and
  flush ib_wq.

* cancel_delayed_work() + flush_scheduled_work() converted to
  cancel_delayed_work_sync().

* qib_wq is removed and ib_wq is used instead.

This is to prepare for deprecation of flush_scheduled_work().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-16 21:16:31 -08:00
Bart Van Assche
695b83495e IB/srp: Test only once whether iu allocation succeeded
Merge the two tests in srp_queuecommand() of whether information unit
allocation succeeded into one.  An intended side effect of this change
is that we fix the warning:

    drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_queuecommand':
    drivers/infiniband/ulp/srp/ib_srp.c:1116: warning: 'req' may be used uninitialized in this function

(seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y at least with gcc 4.4.4)

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-13 14:00:43 -08:00
Joe Perches
948579cd8c RDMA: Use vzalloc() to replace vmalloc()+memset(0)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-12 11:11:58 -08:00
Roland Dreier
2b76c05794 Merge branches 'cxgb4', 'ipath', 'ipoib', 'mlx4', 'mthca', 'nes', 'qib' and 'srp' into for-next 2011-01-10 17:43:30 -08:00
Or Gerlitz
8ae31e5b1f IPoIB: Add GRO support
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:41:55 -08:00
Or Gerlitz
19e364f680 IPoIB: Remove LRO support
As a first step in moving from LRO to GRO, revert commit af40da894e
("IPoIB: add LRO support").  Also eliminate the ethtool set_flags
callback which isn't needed anymore.  Finally, we need to include
<linux/sched.h> directly to get the declaration of restart_syscall()
(which used to be included implicitly through <linux/inet_lro.h>).

Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10 17:41:54 -08:00
David Dillow
9af762719e IB/srp: consolidate hot-path variables into cache lines
Put the variables accessed together in the hot-path into common
cachelines, and separate them by RW vs RO to avoid false dirtying.
We keep a local copy of the lkey and rkey in the target to avoid
traversing pointers (and associated cache lines) to find them.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:51 -05:00
Bart Van Assche
e968467822 IB/srp: stop sharing the host lock with SCSI
We don't need protection against the SCSI stack, so use our own lock to
allow parallel progress on separate CPUs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:50 -05:00
Bart Van Assche
94a9174c63 IB/srp: reduce lock coverage of command completion
We only need the lock to cover list and credit manipulations, so push
those into srp_remove_req() and update the call chains.

We reorder the request removal and command completion in
srp_process_rsp() to avoid the SCSI mid-layer sending another command
before we've released our request and added any credits returned by the
target. This prevents us from returning HOST_BUSY unneccesarily.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out, small cleanups, and modified to avoid potential extraneous
  HOST_BUSY returns by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:50 -05:00
Bart Van Assche
76c75b258f IB/srp: reduce local coverage for command submission and EH
We only need locks to protect our lists and number of credits available.
By pre-consuming the credit for the request, we can reduce our lock
coverage to just those areas. If we don't actually send the request,
we'll need to put the credit back into the pool.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:49 -05:00
Bart Van Assche
536ae14e75 IB/srp: don't move active requests to their own list
We use req->scmnd != NULL to indicate an active request, so there's no
need to keep a separate list for them. We can afford the array iteration
during error handling, and dropping it gives us one less item that needs
lock protection.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-10 15:44:42 -05:00
Bart Van Assche
dcb4cb85f4 IB/srp: allow lockless work posting
Only one CPU at a time will own an RX IU, so using the address of the IU
as the work request cookie allows us to avoid taking a lock. We can
similarly prepare the TX path for lockless posting by moving the free TX
IUs to a list. This also removes the requirement that the queue sizes be
a power of 2.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out, small cleanups, and modified to avoid needing an extra field
  in the IU by David Dillow]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-05 15:24:25 -05:00
Bart Van Assche
9709f0e05b IB/srp: consolidate state change code
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-05 15:24:25 -05:00
David Dillow
f8b6e31e4e IB/srp: allow task management without a previous request
We can only have one task management comment outstanding, so move the
completion and status to the target port. This allows us to handle
resets of a LUN without a corresponding request having been sent.
Meanwhile, we don't need to play games with host_scribble, just use it
as the pointer it is.

This fixes a crash when we issue a bus reset using sg_reset.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=13893
Reported-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-01-05 15:24:25 -05:00
Jeff Garzik
f281233d3e SCSI host lock push-down
Move the mid-layer's ->queuecommand() invocation from being locked
with the host lock to being unlocked to facilitate speeding up the
critical path for drivers who don't need this lock taken anyway.

The patch below presents a simple SCSI host lock push-down as an
equivalent transformation.  No locking or other behavior should change
with this patch.  All existing bugs and locking orders are preserved.

Additionally, add one parameter to queuecommand,
	struct Scsi_Host *
and remove one parameter from queuecommand,
	void (*done)(struct scsi_cmnd *)

Scsi_Host* is a convenient pointer that most host drivers need anyway,
and 'done' is redundant to struct scsi_cmnd->scsi_done.

Minimal code disturbance was attempted with this change.  Most drivers
needed only two one-line modifications for their host lock push-down.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-16 13:33:23 -08:00
Linus Torvalds
9e5fca251f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (63 commits)
  IB/qib: clean up properly if pci_set_consistent_dma_mask() fails
  IB/qib: Allow driver to load if PCIe AER fails
  IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not set
  IB/qib: Fix extra log level in qib_early_err()
  RDMA/cxgb4: Remove unnecessary KERN_<level> use
  RDMA/cxgb3: Remove unnecessary KERN_<level> use
  IB/core: Add link layer type information to sysfs
  IB/mlx4: Add VLAN support for IBoE
  IB/core: Add VLAN support for IBoE
  IB/mlx4: Add support for IBoE
  mlx4_en: Change multicast promiscuous mode to support IBoE
  mlx4_core: Update data structures and constants for IBoE
  mlx4_core: Allow protocol drivers to find corresponding interfaces
  IB/uverbs: Return link layer type to userspace for query port operation
  IB/srp: Sync buffer before posting send
  IB/srp: Use list_first_entry()
  IB/srp: Reduce number of BUSY conditions
  IB/srp: Eliminate two forward declarations
  IB/mlx4: Signal node desc changes to SM by using FW to generate trap 144
  IB: Replace EXTRA_CFLAGS with ccflags-y
  ...
2010-10-26 17:54:22 -07:00
Hagen Paul Pfeifer
732eacc054 replace nested max/min macros with {max,min}3 macro
Use the new {max,min}3 macros to save some cycles and bytes on the stack.
This patch substitutes trivial nested macros with their counterpart.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:12 -07:00
Roland Dreier
116e9535fe Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iboe', 'ipoib', 'misc', 'mlx4', 'nes', 'qib' and 'srp' into for-next 2010-10-26 16:09:11 -07:00
David Dillow
19081f31ce IB/srp: Sync buffer before posting send
srp_send_tsk_mgmt() was missing the proper DMA sync calls before posting
the buffer to the device.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-24 22:14:23 -07:00
Bart Van Assche
21c1a90769 IB/srp: Use list_first_entry()
Use the list_first_entry() macro in ib_srp instead of open-coding the equivalent,
which makes the source code slightly more descriptive.  The list_first_entry()
macro itself was introduced in kernel 2.6.22.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-24 22:14:19 -07:00
Bart Van Assche
7ade400aba IB/srp: Reduce number of BUSY conditions
As proposed by the SRP (draft) standard, ib_srp reserves one ring
element for SRP_TSK_MGMT requests. This patch makes sure that the SCSI
mid-layer never tries to queue more than (SRP request limit) - 1 SCSI
commands to ib_srp. This improves performance for targets whose request
limit is less than or equal to SRP_NORMAL_REQ_SQ_SIZE by reducing the
number of BUSY responses reported by ib_srp to the SCSI mid-layer.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-24 22:14:14 -07:00
David Dillow
05a1d7504f IB/srp: Eliminate two forward declarations
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-24 22:14:08 -07:00
Eli Cohen
c3aa9b186b IPoIB: Set dev_id field of net_device
Use the net device's dev_id field to encode the port number of the pci
device.  This can be used to to associate a net device with the pci
device's port. The encoding is: dev_id = port - 1.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-23 13:35:48 -07:00
David Dillow
bb12588a38 IB/srp: Implement SRP_CRED_REQ and SRP_AER_REQ
This patch adds support for SRP_CRED_REQ to avoid a lockup by targets
that use that mechanism to return credits to the initiator. This
prevents a lockup observed in the field where we would never add the
credits from the SRP_CRED_REQ to our current count, and would therefore
never send another command to the target.

Minimal support for SRP_AER_REQ is also added, as these messages can
also be used to convey additional credits to the initiator.

Based upon extensive debugging and code by Bart Van Assche and a bug
report by Chris Worley.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 22:19:10 -07:00
Bart Van Assche
dd5e6e38b2 IB/srp: Preparation for transmit ring response allocation
The transmit ring in ib_srp (srp_target.tx_ring) is currently only used
for allocating requests sent by the initiator to the target. This patch
prepares using that ring for allocation of both requests and responses.
Also, this patch differentiates the uses of SRP_SQ_SIZE, increases the
size of the IB send completion queue by one element and reserves one
transmit ring slot for SRP_TSK_MGMT requests.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-22 22:19:10 -07:00
Justin P. Mattock
631dd1a885 Update broken web addresses in the kernel.
The patch below updates broken web addresses in the kernel

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Ben Pfaff <blp@cs.stanford.edu>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-10-18 11:03:14 +02:00
Eli Cohen
7b4c876961 IPoIB: Skip IBoE ports
IPoIB is IP-over-Infiniband link layer. In the case of IBoE, the link
layer is Ethernet and IP can work directly over Ethernet, so disable
IPoIB for non-IB_LINK_LAYER_INFINIBAND ports.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-13 09:38:11 -07:00
Christoph Lameter
fed1db33fe IPoIB: Set pkt_type correctly for multicast packets (fix IGMP breakage)
IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for multicast traffic.  All incoming
packets are set to PACKET_HOST which means that igmp_recv() will
ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily.  In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 11:09:23 -07:00
Linus Torvalds
3cc08fc35d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (42 commits)
  IB/qib: Add missing <linux/slab.h> include
  IB/ehca: Drop unnecessary NULL test
  RDMA/nes: Fix confusing if statement indentation
  IB/ehca: Init irq tasklet before irq can happen
  RDMA/nes: Fix misindented code
  RDMA/nes: Fix showing wqm_quanta
  RDMA/nes: Get rid of "set but not used" variables
  RDMA/nes: Read firmware version from correct place
  IB/srp: Export req_lim via sysfs
  IB/srp: Make receive buffer handling more robust
  IB/srp: Use print_hex_dump()
  IB: Rename RAW_ETY to RAW_ETHERTYPE
  RDMA/nes: Fix two sparse warnings
  RDMA/cxgb3: Make needlessly global iwch_l2t_send() static
  IB/iser: Make needlessly global iser_alloc_rx_descriptors() static
  RDMA/cxgb4: Add timeouts when waiting for FW responses
  IB/qib: Fix race between qib_error_qp() and receive packet processing
  IB/qib: Limit the number of packets processed per interrupt
  IB/qib: Allow writes to the diag_counters to be able to clear them
  IB/qib: Set cfgctxts to number of CPUs by default
  ...
2010-08-07 17:08:02 -07:00
Roland Dreier
03b37ecdb3 Merge branches 'cxgb3', 'cxgb4', 'ehca', 'ipath', 'misc', 'nes', 'qib' and 'srp' into for-next 2010-08-05 14:27:14 -07:00
Bart Van Assche
89de74866b IB/srp: Export req_lim via sysfs
Export req_lim via sysfs for debugging.

Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Acked-by: David Dillow <dave@thedillows.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-08-04 13:02:26 -07:00
Linus Torvalds
6ba74014c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
  phy/marvell: add 88ec048 support
  igb: Program MDICNFG register prior to PHY init
  e1000e: correct MAC-PHY interconnect register offset for 82579
  hso: Add new product ID
  can: Add driver for esd CAN-USB/2 device
  l2tp: fix export of header file for userspace
  can-raw: Fix skb_orphan_try handling
  Revert "net: remove zap_completion_queue"
  net: cleanup inclusion
  phy/marvell: add 88e1121 interface mode support
  u32: negative offset fix
  net: Fix a typo from "dev" to "ndev"
  igb: Use irq_synchronize per vector when using MSI-X
  ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
  e1000e: Fix irq_synchronize in MSI-X case
  e1000e: register pm_qos request on hardware activation
  ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
  net: Add getsockopt support for TCP thin-streams
  cxgb4: update driver version
  cxgb4: add new PCI IDs
  ...

Manually fix up conflicts in:
 - drivers/net/e1000e/netdev.c: due to pm_qos registration
   infrastructure changes
 - drivers/net/phy/marvell.c: conflict between adding 88ec048 support
   and cleaning up the IDs
 - drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
   conflict (registration change vs marking it static)
2010-08-04 11:47:58 -07:00
Bart Van Assche
c996bb47bb IB/srp: Make receive buffer handling more robust
The current strategy in ib_srp for posting receive buffers is:

 * Post one buffer after channel establishment.
 * Post one buffer before sending an SRP_CMD or SRP_TSK_MGMT to the target.

As a result, only the first non-SRP_RSP information unit from the
target will be processed.  If that first information unit is an
SRP_T_LOGOUT, it will be processed.  On the other hand, if the
initiator receives an SRP_CRED_REQ or SRP_AER_REQ before it receives a
SRP_T_LOGOUT, the SRP_T_LOGOUT won't be processed.

We can fix this inconsistency by changing the strategy for posting
receive buffers to:

 * Post all receive buffers after channel establishment.
 * After a receive buffer has been consumed and processed, post it again.

A side effect is that the ib_post_recv() call is moved out of the SCSI
command processing path.  Since __srp_post_recv() is not called
directly any more, get rid of it and move the code directly into
srp_post_recv().  Also, move srp_post_recv() up in the file to avoid a
forward declaration.

Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Acked-by: David Dillow <dave@thedillows.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-08-04 11:47:39 -07:00
Bart Van Assche
7a7008110b IB/srp: Use print_hex_dump()
Replace an open-coded dump of the receive buffer with a call to
print_hex_dump().

Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-08-04 11:24:12 -07:00
Or Gerlitz
48d8fcebb7 IB/iser: Make needlessly global iser_alloc_rx_descriptors() static
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-08-04 09:54:55 -07:00
Or Gerlitz
7a52b34b07 IPoIB: Fix world-writable child interface control sysfs attributes
Sumeet Lahorani <sumeet.lahorani@oracle.com> reported that the IPoIB
child entries are world-writable; however we don't want ordinary users
to be able to create and destroy child interfaces, so fix them to be
writable only by root.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-07-06 14:23:22 -07:00
Ben Hutchings
39827be26b IB/{nes, ipoib}: Pass supported flags to ethtool_op_set_flags()
Following commit 1437ce3983 "ethtool:
Change ethtool_op_set_flags to validate flags", ethtool_op_set_flags
takes a third parameter and cannot be used directly as an
implementation of ethtool_ops::set_flags.

Changes nes and ipoib driver to pass in the appropriate value.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-04 11:48:14 -07:00
Linus Torvalds
f8965467f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1674 commits)
  qlcnic: adding co maintainer
  ixgbe: add support for active DA cables
  ixgbe: dcb, do not tag tc_prio_control frames
  ixgbe: fix ixgbe_tx_is_paused logic
  ixgbe: always enable vlan strip/insert when DCB is enabled
  ixgbe: remove some redundant code in setting FCoE FIP filter
  ixgbe: fix wrong offset to fc_frame_header in ixgbe_fcoe_ddp
  ixgbe: fix header len when unsplit packet overflows to data buffer
  ipv6: Never schedule DAD timer on dead address
  ipv6: Use POSTDAD state
  ipv6: Use state_lock to protect ifa state
  ipv6: Replace inet6_ifaddr->dead with state
  cxgb4: notify upper drivers if the device is already up when they load
  cxgb4: keep interrupts available when the ports are brought down
  cxgb4: fix initial addition of MAC address
  cnic: Return SPQ credit to bnx2x after ring setup and shutdown.
  cnic: Convert cnic_local_flags to atomic ops.
  can: Fix SJA1000 command register writes on SMP systems
  bridge: fix build for CONFIG_SYSFS disabled
  ARCNET: Limit com20020 PCI ID matches for SOHARD cards
  ...

Fix up various conflicts with pcmcia tree drivers/net/
{pcmcia/3c589_cs.c, wireless/orinoco/orinoco_cs.c and
wireless/orinoco/spectrum_cs.c} and feature removal
(Documentation/feature-removal-schedule.txt).

Also fix a non-content conflict due to pm_qos_requirement getting
renamed in the PM tree (now pm_qos_request) in net/mac80211/scan.c
2010-05-20 21:04:44 -07:00
Roland Dreier
ffebedb7ab Merge branches 'amso1100', 'bkl', 'cma', 'cxgb3', 'cxgb4', 'ipoib', 'iser', 'masked-atomics', 'misc', 'mthca' and 'nes' into for-next 2010-05-15 20:06:01 -07:00
Dan Carpenter
9fda1ac5fa IB/iser: Fix error flow in iser_create_ib_conn_res()
We shouldn't free things here because we free them later.
The call tree looks like this:
	iser_connect() ==> initiating the connection establishment
and later
	iser_cma_handler() => iser_route_handler() => iser_create_ib_conn_res()
if we fail here, eventually iser_conn_release() is called, resulting
in a double free.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-05-12 09:30:45 -07:00
Or Gerlitz
39ff05dbbb IB/iser: Enhance disconnection logic for multi-pathing
The iser connection teardown flow isn't over until the underlying
Connection Manager (e.g the IB CM) delivers a disconnected or timeout
event through the RDMA-CM.  When the remote (target) side isn't
reachable, e.g when some HW e.g port/hca/switch isn't functioning or
taken down administratively, the CM timeout flow is used and the event
may be generated only after relatively long time -- on the order of
tens of seconds.

The current iser code exposes this possibly long delay to higher
layers, specifically to the iscsid daemon and iscsi kernel stack. As a
result, the iscsi stack doesn't respond well: this low-level CM delay
is added to the fail-over time under HA schemes such as the one
provided by DM multipath through the multipathd(8) service.

This patch enhances the reference counting scheme on iser's IB
connections so that the disconnect flow initiated by iscsid from user
space (ep_disconnect) doesn't wait for the CM to deliver the
disconnect/timeout event.  (The connection teardown isn't done from
iser's view point until the event is delivered)

The iser ib (rdma) connection object is destroyed when its reference
count reaches zero.  When this happens on the RDMA-CM callback
context, extra care is taken so that the RDMA-CM does the actual
destroying of the associated ID, since doing it in the callback is
prohibited.

The reference count of iser ib connection normally reaches three,
where the <ref, deref> relations are

 1. conn <init, terminate>
 2. conn <bind, stop/destroy>
 3. cma id <create, disconnect/error/timeout callbacks>

With this patch, multipath fail-over time is about 30 seconds, while
without this patch, multipath fail-over time is about 130 seconds.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-05-12 09:30:44 -07:00
Or Gerlitz
d265b98082 IB/iser: Remove buggy back-pointer setting
The iscsi connection object life cycle includes binding and unbinding
(conn_stop) to/from the iscsi transport connection object.  Since
iscsi connection objects are recycled, at the time the transport
connection (e.g iser's IB connection) is released, it is not valid to
touch the iscsi connection tied to the transport back-pointer since it
may already point to a different transport connection.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-05-12 09:30:44 -07:00
Or Gerlitz
2110f9bf37 IB/iser: Add asynchronous event handler
Add handler to handle events such as port up and down.  This is useful
when testing high-availability schemes such as multi-pathing.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-05-12 09:30:43 -07:00
Or Gerlitz
d414371795 IPoIB: Allow disabling/enabling TSO on the fly through ethtool
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-04-21 17:17:30 -07:00
David S. Miller
871039f02f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/stmmac/stmmac_main.c
	drivers/net/wireless/wl12xx/wl1271_cmd.c
	drivers/net/wireless/wl12xx/wl1271_main.c
	drivers/net/wireless/wl12xx/wl1271_spi.c
	net/core/ethtool.c
	net/mac80211/scan.c
2010-04-11 14:53:53 -07:00
David S. Miller
4a35ecf8bf Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bonding/bond_main.c
	drivers/net/via-velocity.c
	drivers/net/wireless/iwlwifi/iwl-agn.c
2010-04-06 23:53:30 -07:00
Jiri Pirko
22bedad3ce net: convert multicast list to list_head
Converts the list and the core manipulating with it to be the same as uc_list.

+uses two functions for adding/removing mc address (normal and "global"
 variant) instead of a function parameter.
+removes dev_mcast.c completely.
+exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
 manipulation with lists on a sandbox (used in bonding and 80211 drivers)

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 14:22:15 -07:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Jiri Pirko
3e4aa12f8a ipoib: remove addrlen check for mc addresses
Finally this bit can be removed. Currently, after the bonding driver is
changed/fixed (32a806c194 net-next-2.6),
that's not possible for an addr with different length than dev->addr_len
to be present in list. Removing this check as in new mc_list there will be
no addrlen in the record.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-22 18:33:11 -07:00
Linus Torvalds
961cde93de Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (69 commits)
  [SCSI] scsi_transport_fc: Fix synchronization issue while deleting vport
  [SCSI] bfa: Update the driver version to 2.1.2.1.
  [SCSI] bfa: Remove unused header files and did some cleanup.
  [SCSI] bfa: Handle SCSI IO underrun case.
  [SCSI] bfa: FCS and include file changes.
  [SCSI] bfa: Modified the portstats get/clear logic
  [SCSI] bfa: Replace bfa_get_attr() with specific APIs
  [SCSI] bfa: New portlog entries for events (FIP/FLOGI/FDISC/LOGO).
  [SCSI] bfa: Rename pport to fcport in BFA FCS.
  [SCSI] bfa: IOC fixes, check for IOC down condition.
  [SCSI] bfa: In MSIX mode, ignore spurious RME interrupts when FCoE ports are in FW mismatch state.
  [SCSI] bfa: Fix Command Queue (CPE) full condition check and ack CPE interrupt.
  [SCSI] bfa: IOC recovery fix in fcmode.
  [SCSI] bfa: AEN and byte alignment fixes.
  [SCSI] bfa: Introduce a link notification state machine.
  [SCSI] bfa: Added firmware save clear feature for BFA driver.
  [SCSI] bfa: FCS authentication related changes.
  [SCSI] bfa: PCI VPD, FIP and include file changes.
  [SCSI] bfa: Fix to copy fpma MAC when requested by user space application.
  [SCSI] bfa: RPORT state machine: direct attach mode fix.
  ...
2010-03-18 16:54:31 -07:00
Or Gerlitz
a48f509b26 IPoIB: Include return code in trace message for ib_post_send() failures
Print the return code of ib_post_send() if it fails to make these
debugging messages more useful.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-03-11 13:43:11 -08:00
Eli Cohen
f0dc117abd IPoIB: Fix TX queue lockup with mixed UD/CM traffic
The IPoIB UD QP reports send completions to priv->send_cq, which is
usually left unarmed; it only gets armed when the number of
outstanding send requests reaches the size of the TX queue. This
arming is done only in the send path for the UD QP.  However, when
sending CM packets, the net queue may be stopped for the same reasons
but no measures are taken to recover the UD path from a lockup.

Consider this scenario: a host sends high rate of both CM and UD
packets, with a TX queue length of N.  If at some time the number of
outstanding UD packets is more than N/2 and the overall outstanding
packets is N-1, and CM sends a packet (making the number of
outstanding sends equal N), the TX queue will be stopped.  When all
the CM packets complete, the number of outstanding packets will still
be higher than N/2 so the TX queue will not be restarted.

Fix this by calling ib_req_notify_cq() when the queue is stopped in
the CM path.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-03-11 13:37:11 -08:00
Linus Torvalds
3ff1562ea4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (48 commits)
  IB/srp: Clean up error path in srp_create_target_ib()
  IB/srp: Split send and recieve CQs to reduce number of interrupts
  RDMA/nes: Add support for KR device id 0x0110
  IB/uverbs: Use anon_inodes instead of private infinibandeventfs
  IB/core: Fix and clean up ib_ud_header_init()
  RDMA/cxgb3: Mark RDMA device with CXIO_ERROR_FATAL when removing
  RDMA/cxgb3: Don't allocate the SW queue for user mode CQs
  RDMA/cxgb3: Increase the max CQ depth
  RDMA/cxgb3: Doorbell overflow avoidance and recovery
  IB/core: Pack struct ib_device a little tighter
  IB/ucm: Clean whitespace errors
  IB/ucm: Increase maximum devices supported
  IB/ucm: Use stack variable 'base' in ib_ucm_add_one
  IB/ucm: Use stack variable 'devnum' in ib_ucm_add_one
  IB/umad: Clean whitespace
  IB/umad: Increase maximum devices supported
  IB/umad: Use stack variable 'base' in ib_umad_init_port
  IB/umad: Use stack variable 'devnum' in ib_umad_init_port
  IB/umad: Remove port_table[]
  IB/umad: Convert *cdev to cdev in struct ib_umad_port
  ...
2010-03-03 07:33:17 -08:00
Jayamohan Kallickal
309ce156aa [SCSI] libiscsi: Make iscsi_eh_target_reset start with session reset
The iscsi_eh_target_reset has been modified to attempt
target reset only. If it fails, then iscsi_eh_session_reset
will be called.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Jayamohan Kallickal <jayamohank@serverengines.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-03-03 17:39:04 +05:30
Roland Dreier
f57721507f Merge branch 'srp' into for-next 2010-03-01 23:51:57 -08:00
Roland Dreier
5c2187f0a1 Merge branch 'iser' into for-next 2010-03-01 23:51:56 -08:00
Roland Dreier
da9d2f0730 IB/srp: Clean up error path in srp_create_target_ib()
Instead of repeating the error unwinding steps in each place an error
can be detected, use the common idiom of gotos into an error flow.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-03-01 23:51:39 -08:00
Bart Van Assche
9c03dc9f19 IB/srp: Split send and recieve CQs to reduce number of interrupts
We can reduce the number of IB interrupts from two interrupts per
srp_queuecommand() call to one by using separate CQs for send and
receive completions and processing send completions by polling every
time a TX IU is allocated.

Receive completion events still trigger an interrupt.

Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-03-01 23:51:38 -08:00
Jiri Pirko
6c74651c3b ipoib: returned back addrlen check for mc addresses
Apparently bogus mc address can break IPOIB multicast processing. Therefore
returning the check for addrlen back until this is resolved in bonding (I don't
see any other point from where mc address with non-dev->addr_len length can came
from).

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-28 01:34:28 -08:00
Jiri Pirko
fbf219f1c8 infiniband: convert to use netdev_for_each_mc_addr
Due to the loop complexicity in nes_nic.c, I'm using char* to copy mc addresses
to it.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-26 04:22:27 -08:00
Or Gerlitz
88ec415772 IB/iser: Remove redundant locking from iser scsi command response flow
Currently the iSER receive completion flow takes the session lock
twice.  Optimize it to avoid the first one by letting
iser_task_rdma_finalize() be called only from the cleanup_task
callback invoked by iscsi_free_task, thus reducing the contention on
the session lock between the scsi command submission to the scsi
command completion flows.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 09:41:14 -08:00
Or Gerlitz
962b4b528b IB/iser: Use libiscsi passthrough mode
libiscsi passthrough mode invokes the transport xmit calls directly
without first going through an internal queue, unlike the other mode,
which uses a queue and a xmitworker thread.  Now that the "cant_sleep"
prerequisite of iscsi_host_alloc is met, move to use it.  Handling
xmit errors is now done by the passthrough flow of libiscsi.  Since
the queue/worker aren't used in this mode, the code that schedules the
xmitworker is removed.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 09:41:14 -08:00
Or Gerlitz
aae3c995ff IB/iser: Remove unnecessary connection checks
Remove unnecessary checks for the IB connection state and for QP
overflow, as conn state changes are reported by iSER to libiscsi and
handled there. QP overflow is theoretically possible only when
unsolicited data-outs are used; anyway it's being checked and handled
by HW drivers.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 09:41:13 -08:00
Or Gerlitz
528f4e8c83 IB/iser: Use atomic allocations
Two minor flows in iSER's data path still use allocations; move them
to be atomic as a preperation step towards moving to use libiscsi
passthrough mode.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 09:41:12 -08:00
Or Gerlitz
f19624aa92 IB/iser: Simplify send flow/descriptors
Simplify and shrink the logic/code used for the send descriptors.
Changes include removing struct iser_dto (an unnecessary abstraction),
using struct iser_regd_buf only for handling SCSI commands, using
dma_sync instead of dma_map/unmap, etc.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 09:41:12 -08:00
Or Gerlitz
78ad0a34dc IB/iser: Use different CQ for send completions
Use a different CQ for send completions, where send completions are
polled by the interrupt-driven receive completion handler.  Therefore,
interrupts aren't used for the send CQ.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 09:41:11 -08:00
Or Gerlitz
704315f082 IB/iser: Remove atomic counter for posted receive buffers
Now that both the posting and reaping of receive buffers is done in
the completion path, the counter of outstanding buffers not be atomic.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 09:41:11 -08:00
Or Gerlitz
bcc60c381d IB/iser: New receive buffer posting logic
Currently, the recv buffer posting logic is based on the transactional
nature of iSER which allows for posting a buffer before sending a PDU.
Change this to post only when the number of outstanding recv buffers
is below a water mark and in a batched manner, thus simplifying and
optimizing the data path.  Use a pre-allocated ring of recv buffers
instead of allocating from kmem cache.  A special treatment is given
to the login response buffer whose size must be 8K unlike the size of
buffers used for any other purpose which is 128 bytes.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 09:41:10 -08:00
Or Gerlitz
1cef465985 IB/iser: Revert commit bba7ebb "avoid recv buffer exhaustion"
We will make a major change in the recv buffer posting logic, after
which the problem commit bba7ebb "avoid recv buffer exhaustion caused
by unexpected PDUs" comes to solve doesn't exist any more, so revert it.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 09:41:09 -08:00
Alexey Dobriyan
3ffe533c87 ipv6: drop unused "dev" arg of icmpv6_send()
Dunno, what was the idea, it wasn't used for a long time.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18 14:30:17 -08:00
Or Gerlitz
757bebb3f9 IPoIB: Remove TX moderation settings from ethtool support
As of commit f56bcd8 ("IPoIB: Use separate CQ for UD send
completions"), there are no TX interrupts.  Change the ethtool code
not to report TX moderation settings, so users will not be misled to
think they can control TX interrupt moderation.  Pointed out by Alex
Vainman <alexv@voltaire.com>

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-11 15:40:40 -08:00
Linus Torvalds
e69381b417 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
  RDMA/cxgb3: Fix error paths in post_send and post_recv
  RDMA/nes: Fix stale ARP issue
  RDMA/nes: FIN during MPA startup causes timeout
  RDMA/nes: Free kmap() resources
  RDMA/nes: Check for zero STag
  RDMA/nes: Fix Xansation test crash on cm_node ref_count
  RDMA/nes: Abnormal listener exit causes loopback node crash
  RDMA/nes: Fix crash in nes_accept()
  RDMA/nes: Resource not freed for REJECTed connections
  RDMA/nes: MPA request/response error checking
  RDMA/nes: Fix query of ORD values
  RDMA/nes: Fix MAX_CM_BUFFER define
  RDMA/nes: Pass correct size to ioremap_nocache()
  RDMA/nes: Update copyright and branding string
  RDMA/nes: Add max_cqe check to nes_create_cq()
  RDMA/nes: Clean up struct nes_qp
  RDMA/nes: Implement IB_SIGNAL_ALL_WR as an iWARP extension
  RDMA/nes: Add additional SFP+ PHY uC status check and PHY reset
  RDMA/nes: Correct fast memory registration implementation
  IB/ehca: Fix error paths in post_send and post_recv
  ...
2009-12-16 10:32:31 -08:00
Roland Dreier
14f369d1d6 Merge branches 'amso1100', 'cma', 'cxgb3', 'ehca', 'ipath', 'ipoib', 'iser', 'misc', 'mlx4' and 'nes' into for-next 2009-12-15 23:39:25 -08:00
Linus Torvalds
4ef58d4e2a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
  tree-wide: fix misspelling of "definition" in comments
  reiserfs: fix misspelling of "journaled"
  doc: Fix a typo in slub.txt.
  inotify: remove superfluous return code check
  hdlc: spelling fix in find_pvc() comment
  doc: fix regulator docs cut-and-pasteism
  mtd: Fix comment in Kconfig
  doc: Fix IRQ chip docs
  tree-wide: fix assorted typos all over the place
  drivers/ata/libata-sff.c: comment spelling fixes
  fix typos/grammos in Documentation/edac.txt
  sysctl: add missing comments
  fs/debugfs/inode.c: fix comment typos
  sgivwfb: Make use of ARRAY_SIZE.
  sky2: fix sky2_link_down copy/paste comment error
  tree-wide: fix typos "couter" -> "counter"
  tree-wide: fix typos "offest" -> "offset"
  fix kerneldoc for set_irq_msi()
  spidev: fix double "of of" in comment
  comment typo fix: sybsystem -> subsystem
  ...
2009-12-09 19:43:33 -08:00
David J. Wilder
0cd4d0fd9b IPoIB: Clear ipoib_neigh.dgid in ipoib_neigh_alloc()
IPoIB can miss a change in destination GID under some conditions.  The
problem is caused when ipoib_neigh->dgid contains a stale address.
The fix is to set ipoib_neigh->dgid to zero in ipoib_neigh_alloc().

This can happen when a system using bonding on its IPoIB interfaces
has switched its active interface from interface A to B and back to A.
The system that fails over will not correctly processes the 2nd
address change, as described below.

When an address has changed neighbor->ha is updated with the new
address.  Each neighbor has an associated ipoib_neigh.
ipoib_neigh->dgid also holds a copy of the remote node's hardware
address.  When an address changes neighbor->ha is updated by the
network layer (arp code) with the new address.  IPoIB detects this
change in ipoib_start_xmit() by comparing neighbor->ha with
ipoib_neigh->dgid.  The bug is that ipoib_neigh->dgid may already
contain the new address (A) thus the change from B to A is missed by
ipoib.  Here is the sequence of events:

    ipoib_neigh->dgid = A  and  neighbor->ha = A

The address is switched to B (the first switch)

    neighbor->ha = B

The change is seen in ipoib_start_xmit() -- neighbor->ha !=
ipoib_neigh->dgid so ipoib_neigh is released, and a new one is
allocated.

The allocator may return the same chunk of memory that was just
released, therefore ipoib_neigh->dgid still contains A at this point.

ipoib_neigh->dgid should be updated in neigh_add_path(), but if the
following conditions are true dgid is not updated:

        1) __path_find() returns a path
        2) path->ah is NULL

The remote system now switches from address B to A, neighbor->ha is
updated to A.

Now we have again : ipoib_neigh->dgid = A  and  neighbor->ha = A

Since the addresses are the same ipoib won't process the change in
address.  Fix this by zeroing out the dgid field when allocating a new
struct ipoib_neigh.

Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 10:03:00 -08:00
Mike Christie
b20d038dff [SCSI] iser: set tgt and lu reset timeout
When iser enabled lu reset support it did not set the
bit to allow userspace to get/set the timeout. This
sets the tgt and lu reset timeout bits.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-12-04 12:01:38 -06:00
Thadeu Lima de Souza Cascardo
94e2bd6888 tree-wide: fix some typos and punctuation in comments
fix some typos and punctuation in comments

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04 15:39:48 +01:00
Or Gerlitz
c1ccaf2478 IB/iser: Rewrite SG handling for RDMA logic
After dma-mapping an SG list provided by the SCSI midlayer, iser has
to make sure the mapped SG is "aligned for RDMA" in the sense that its
possible to produce one mapping in the HCA IOMMU which represents the
whole SG. Next, the mapped SG is formatted for registration with the HCA.

This patch re-writes the logic that does the above, to make it clearer
and simpler. It also fixes a bug in the being aligned for RDMA checks,
where a "start" check wasn't done but rather only "end" check.

Signed-off-by: Alexander Nezhinsky <alexandern@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-12 11:32:27 -08:00
Jayamohan Kallickal
b8b9e1b812 [SCSI] libiscsi: iscsi_session_setup to allow for private space
This patch contains changes that allow iscsi_session_setup
to allocate private space for LLD's

Signed-off-by: Jayamohan Kallickal <jayamohank@serverengines.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-02 14:01:39 -05:00
Linus Torvalds
d7757be133 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB: Don't turn on carrier for a non-active port
  IB/mthca: Fix access to freed memory in catastrophic event handling
  mlx4_core: Pass cache line size to device FW
  RDMA/nes: Remove duplicate .ndo_set_mac_address field initialization
  IB/mad: Fix lock-lock-timer deadlock in RMPP code
2009-09-24 17:06:01 -07:00
Moni Shoua
5ee9512084 IPoIB: Don't turn on carrier for a non-active port
Multicast joins can succeed even if the IB port is down.  This happens
when the SM runs on the same port with the requesting port.  However,
IPoIB calls netif_carrier_on() when the join of the broadcast group
succeeds, without caring about the state of the IB port.  The result
is an IPoIB interface in RUNNING state but without an active IB port
to support it.

If a bonding interface uses this IPoIB interface as a slave it might
not detect that this slave is almost useless and failover
functionality will be damaged.  The fix checks the state of the IB
port in the carrier_task before calling netif_carrier_on().

Adresses: https://bugs.openfabrics.org/show_bug.cgi?id=1726
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-24 12:01:05 -07:00
Linus Torvalds
d7e9660ad9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
  netxen: update copyright
  netxen: fix tx timeout recovery
  netxen: fix file firmware leak
  netxen: improve pci memory access
  netxen: change firmware write size
  tg3: Fix return ring size breakage
  netxen: build fix for INET=n
  cdc-phonet: autoconfigure Phonet address
  Phonet: back-end for autoconfigured addresses
  Phonet: fix netlink address dump error handling
  ipv6: Add IFA_F_DADFAILED flag
  net: Add DEVTYPE support for Ethernet based devices
  mv643xx_eth.c: remove unused txq_set_wrr()
  ucc_geth: Fix hangs after switching from full to half duplex
  ucc_geth: Rearrange some code to avoid forward declarations
  phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
  drivers/net/phy: introduce missing kfree
  drivers/net/wan: introduce missing kfree
  net: force bridge module(s) to be GPL
  Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
  ...

Fixed up trivial conflicts:

 - arch/x86/include/asm/socket.h

   converted to <asm-generic/socket.h> in the x86 tree.  The generic
   header has the same new #define's, so that works out fine.

 - drivers/net/tun.c

   fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
   switched over to using 'tun->socket.sk' instead of the redundantly
   available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
   to the TUN driver") which added a new 'tun->sk' use.

   Noted in 'next' by Stephen Rothwell.
2009-09-14 10:37:28 -07:00
Jason Gunthorpe
5e47596bee IPoIB: Check multicast address format
Check that the format of multicast link addresses is correct before
taking them from dev->mc_list to priv->multicast_list.  This way we
never try to send a bogus address to the SA, which prevents badness
from erronous 'ip maddr addr add', broken bonding drivers, etc.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:23:40 -07:00
Roland Dreier
721d67cdca IPoIB: Drop priv->lock before calling ipoib_send()
IPoIB currently must use irqsave locking for priv->lock, since it is
taken from interrupt context in one path.  However, ipoib_send() does
skb_orphan(), and the network stack locking is not IRQ-safe.
Therefore we need to make sure we don't hold priv->lock when calling
ipoib_send() to avoid lockdep warnings (the code was almost certainly
safe in practice, since the only code path that takes priv->lock from
interrupt context would never call into the network stack).

Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757
Reported-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:23:40 -07:00
Roland Dreier
cd0bcf4cb9 IPoIB: Remove unused <rdma/ib_cache.h> includes
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:23:38 -07:00
Eric Dumazet
451f144398 drivers: Kill now superfluous ->last_rx stores
The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Neil Horman <nhorman@txudriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 23:07:36 -07:00
David S. Miller
9cbc1cb8cd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/scsi/fcoe/fcoe.c
	net/core/drop_monitor.c
	net/core/net-traces.c
2009-06-15 03:02:23 -07:00
Eric Dumazet
adf30907d6 net: skb->dst accessors
Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03 02:51:04 -07:00
Eric Dumazet
86d15cd833 net: unset IFF_XMIT_DST_RELEASE for qeth and ipoib
Last two drivers that need skb->dst in their start_xmit() function

Tell dev_hard_start_xmit() to no release it by unsetting  IFF_XMIT_DST_RELEASE

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-30 23:04:46 -07:00
Mike Christie
b3cd5050bf [SCSI] libiscsi: add task aborted state
If a task did not complete normally due to a TMF, libiscsi will
now complete the task with the state ISCSI_TASK_ABRT_TMF. Drivers
like bnx2i that need to free resources if a command did not complete normally
can then check the task state. If a driver does not need to send
a special command if we have dropped the session then they can check
for ISCSI_TASK_ABRT_SESS_RECOV.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-05-23 15:44:13 -05:00
Mike Christie
10eb0f013c [SCSI] iscsi: pass ep connect shost
When we create the tcp/ip connection by calling ep_connect, we currently
just go by the routing table info.

I think there are two problems with this.

1. Some drivers do not have access to a routing table. Some drivers like
qla4xxx do not even know about other ports.

2. If you have two initiator ports on the same subnet, the user may have
set things up so that session1 was supposed to be run through port1. and
session2 was supposed to be run through port2. It looks like we could
end with both sessions going through one of the ports.

Fixes for cxgb3i from Karen Xie.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-05-23 15:44:09 -05:00
Eric W. Biederman
26574401fe net: Fix ipoib rtnl_lock sysfs deadlock.
Network device sysfs files that grab the rtnl_lock unconditionally
will deadlock if accessed when the network device is being
unregistered.  So use trylock and syscall_restart to avoid this
deadlock.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:15:59 -07:00
Linus Torvalds
61bd1e858d Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (53 commits)
  [SCSI] libosd: OSD2r05: on-the-wire changes for latest OSD2 revision 5.
  [SCSI] libosd: OSD2r05: OSD_CRYPTO_KEYID_SIZE will grow 20 => 32 bytes
  [SCSI] libosd: OSD2r05: Prepare for rev5 attribute list changes
  [SCSI] libosd: fix potential ERR_PTR dereference in osd_initiator.c
  [SCSI] mpt2sas : bump driver version to 01.100.02.00
  [SCSI] mpt2sas: fix hotplug event processing
  [SCSI] mpt2sas : release diagnotic buffers prior host reset
  [SCSI] mpt2sas : Broadcast Primative AEN bug fix
  [SCSI] mpt2sas : Identify Dell series-7 adapters at driver load time
  [SCSI] mpt2sas : driver name needs to be in the MPT2IOCINFO ioctl
  [SCSI] mpt2sas : running out of message frames
  [SCSI] mpt2sas : fix oops when firmware sends large sense buffer size
  [SCSI] mpt2sas : the sanity check in base_interrupt needs to be on dword boundary
  [SCSI] mpt2sas : unique ioctl magic number
  [SCSI] fix sign extension with 1.5TB usb-storage LBD=y
  [SCSI] ipr: Fix sleeping function called with interrupts disabled
  [SCSI] fcoe: fip: add multicast filter to receive FIP advertisements.
  [SCSI] libfc: Fix compilation warnings with allmodconfig
  [SCSI] fcoe: fix spelling typos and bad comments
  [SCSI] fcoe: don't export functions that are internal to fcoe
  ...
2009-05-02 16:36:34 -07:00
Mike Christie
6b5d6c443a [SCSI] cxgb3i, iser, iscsi_tcp: set target can queue
Set target can queue limit to the number of preallocated
session tasks we have.

This along with the cxgb3i can_queue patch will fix a throughput
problem where it could only queue one LU worth of data at a time.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-27 10:09:54 -05:00
Yossi Etigin
e028cc55cc IPoIB: Disable NAPI while CQ is being drained
If NAPI is enabled while IPoIB's CQ is being drained, it creates a
race on priv->ibwc between ipoib_poll() and ipoib_drain_cq(), leading
to memory corruption.

The solution is to enable/disable NAPI in ipoib_ib_dev_{open/stop}()
instead of in ipoib_{open/stop}(), and sync NAPI on the INITIALIZED
flag instead on the ADMIN_UP flag. This way NAPI will be disabled when
ipoib_drain_cq() is called.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1587>.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-04-20 13:58:08 -07:00
Linus Torvalds
0534c8cb5c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/nes: Add support for new SFP+ PHY
  RDMA/nes: Add wide_ppm_offset parm for switch compatibility
  RDMA/nes: Fix SFP+ PHY initialization
  RDMA/nes: Fix nes_nic_cm_xmit() error handling
  RDMA/nes: Fix error handling issues
  RDMA/nes: Fix incorrect casts on 32-bit architectures
  IPoIB: Document newish features
  RDMA/cma: Create cm id even when IB port is down
  RDMA/cma: Use rate from IPoIB broadcast when joining IPoIB multicast groups
  IPoIB: Avoid free_netdev() BUG when destroying a child interface
  mlx4_core: Don't leak mailbox for SET_PORT on Ethernet ports
  RDMA/cxgb3: Release dependent resources only when endpoint memory is freed.
  RDMA/cxgb3: Handle EEH events
  IB/mlx4: Use pgprot_writecombine() for BlueFlame pages
2009-04-09 16:42:26 -07:00
Roland Dreier
edb5abb1e2 IPoIB: Avoid free_netdev() BUG when destroying a child interface
We have to release the RTNL before calling free_netdev() so that the
device state has a chance to become NETREG_UNREGISTERED.  Otherwise
when removing a child interface, we hit the BUG() that tests the
device state in free_netdev().

Reported-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-03-31 10:22:32 -07:00
Linus Torvalds
d54b3538b0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (119 commits)
  [SCSI] scsi_dh_rdac: Retry for NOT_READY check condition
  [SCSI] mpt2sas: make global symbols unique
  [SCSI] sd: Make revalidate less chatty
  [SCSI] sd: Try READ CAPACITY 16 first for SBC-2 devices
  [SCSI] sd: Refactor sd_read_capacity()
  [SCSI] mpt2sas v00.100.11.15
  [SCSI] mpt2sas: add MPT2SAS_MINOR(221) to miscdevice.h
  [SCSI] ch: Add scsi type modalias
  [SCSI] 3w-9xxx: add power management support
  [SCSI] bsg: add linux/types.h include to bsg.h
  [SCSI] cxgb3i: fix function descriptions
  [SCSI] libiscsi: fix possbile null ptr session command cleanup
  [SCSI] iscsi class: remove host no argument from session creation callout
  [SCSI] libiscsi: pass session failure a session struct
  [SCSI] iscsi lib: remove qdepth param from iscsi host allocation
  [SCSI] iscsi lib: have lib create work queue for transmitting IO
  [SCSI] iscsi class: fix lock dep warning on logout
  [SCSI] libiscsi: don't cap queue depth in iscsi modules
  [SCSI] iscsi_tcp: replace scsi_debug/tcp_debug logging with iscsi conn logging
  [SCSI] libiscsi_tcp: replace tcp_debug/scsi_debug logging with session/conn logging
  ...
2009-03-28 13:30:43 -07:00
Linus Torvalds
13220a94d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1750 commits)
  ixgbe: Allow Priority Flow Control settings to survive a device reset
  net: core: remove unneeded include in net/core/utils.c.
  e1000e: update version number
  e1000e: fix close interrupt race
  e1000e: fix loss of multicast packets
  e1000e: commonize tx cleanup routine to match e1000 & igb
  netfilter: fix nf_logger name in ebt_ulog.
  netfilter: fix warning in ebt_ulog init function.
  netfilter: fix warning about invalid const usage
  e1000: fix close race with interrupt
  e1000: cleanup clean_tx_irq routine so that it completely cleans ring
  e1000: fix tx hang detect logic and address dma mapping issues
  bridge: bad error handling when adding invalid ether address
  bonding: select current active slave when enslaving device for mode tlb and alb
  gianfar: reallocate skb when headroom is not enough for fcb
  Bump release date to 25Mar2009 and version to 0.22
  r6040: Fix second PHY address
  qeth: fix wait_event_timeout handling
  qeth: check for completion of a running recovery
  qeth: unregister MAC addresses during recovery.
  ...

Manually fixed up conflicts in:
	drivers/infiniband/hw/cxgb3/cxio_hal.h
	drivers/infiniband/hw/nes/nes_nic.c
2009-03-26 15:54:36 -07:00
Roland Dreier
09f98bafea Merge branches 'cxgb3', 'endian', 'ipath', 'ipoib', 'iser', 'mad', 'misc', 'mlx4', 'mthca', 'nes' and 'sysfs' into for-next 2009-03-24 20:44:41 -07:00
Stephen Hemminger
fe8114e8e1 infiniband: convert ipoib to net_device_ops
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 19:19:14 -07:00
Mike Christie
5e7facb77f [SCSI] iscsi class: remove host no argument from session creation callout
We do not need to have llds set the host no for the session's
parent, because we know the session's parent is going to be
the host. This removes it from the session creation callback
and converts the drivers.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-03-13 15:29:39 -05:00
Mike Christie
4d1083509a [SCSI] iscsi lib: remove qdepth param from iscsi host allocation
The qdepth setting was useful when we needed libiscsi to verify
the setting. Now we just need to make sure if older tools
passed in zero then we need to set some default.

So this patch just has us use the sht->cmd_per_lun or if
for LLD does a host per session then we can set it on per
host basis.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-03-13 15:28:55 -05:00
Mike Christie
32ae763e3f [SCSI] iscsi lib: have lib create work queue for transmitting IO
We were using the shost work queue which ended up being
a little akward since all iscsi hosts need a thread for
scanning, but only drivers hooked into libiscsi need
a workqueue for transmitting. So this patch moves the
xmit workqueue to the lib.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-03-13 15:28:37 -05:00
Mike Christie
e28f3d5b51 [SCSI] libiscsi: don't cap queue depth in iscsi modules
There is no need to cap the queue depth in the modules. We set
this in userspace and can do that there. For performance testing
with ram based targets, this is helpful since we can have very
high queue depths.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-03-13 15:28:06 -05:00
Mike Christie
48a237a26d [SCSI] iser: have iser use its own logging
iser has its own logging inrfastrucutre. Convert it to use
it instead of libiscsi.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-03-13 15:26:51 -05:00
Or Gerlitz
1aedb7721f IB/iser: Remove hard setting of path MTU
Remove hard setting of the IB MTU used by iSER's RC queue-pair to 1K,
as this was done due to inter-op issues with an old iser target which
is not used any more.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-02-27 10:30:46 -08:00
Jack Morgenstein
71d98b4628 IPoIB: In unicast_arp_send(), only free newly-created paths
If path_rec_start() returns error, call path_free() only if the path
was newly-created.  If we free an existing path whose valid flag was zero,
(but do not detach it from the list) we cause corruption of the
path list (of which it is a member), and get a kernel crash.

The simplest solution is to not free an existing path -- just leave it
in the list as-is (i.e., with its valid flag cleared).

Thanks to Yossi Etigin of Voltaire for identifying the problem flow
which caused the kernel crash.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Moni Shua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-02-17 14:51:47 -08:00
Ben Hutchings
288379f050 net: Remove redundant NAPI functions
Following the removal of the unused struct net_device * parameter from
the NAPI functions named *netif_rx_* in commit 908a7a1, they are
exactly equivalent to the corresponding *napi_* functions and are
therefore redundant.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-21 14:33:50 -08:00
Yossi Etigin
3c20962086 IPoIB: Do not print error messages for multicast join retries
When IPoIB tries to join a multicast group, and the SA module's SM
address handle is NULL (because of an SM change, etc), the join
returns with -EAGAIN status.  In that case, don't print an error
message unless multicast debugging is enabled.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-16 13:42:59 -08:00
Roland Dreier
cbbe1efa49 IPoIB: Fix deadlock between ipoib_open() and child interface create
Fix a deadlock between child interface creation/deletion and ipoib
start/stop.  The former takes vlan_mutex, and then might take RTNL via
register_netdev()/unregister_netdev().  The latter is executed with
RTNL held, and tries to take vlan_mutex, which can lead to an AB-BA
deadlock.

Fix this by having the child interface creation/deletion code take the
RTNL first so vlan_mutex always nests inside RTNL.  We can use
register_netdevice() for child interfaces because we form the
interface name from the parent interface and hence don't need the '%'
expansion of register_netdev().

Reported-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-14 21:44:39 -08:00
Roland Dreier
b8a1b1ce14 IPoIB: Fix hang in napi_disable() if P_Key is never found
After commit fe25c561 ("IPoIB: Don't enable NAPI when it's already
enabled"), if an interface is brought up but the corresponding P_Key
never appears, then ipoib_stop() will hang in napi_disable(), because
ipoib_open() returns before it does napi_enable().

Fix this by changing ipoib_open() to call napi_enable() even if the
P_Key isn't present.

Reported-by: Yossi Etigin <yosefe@Voltaire.COM>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-14 14:55:41 -08:00
Linus Torvalds
ccbf04f24c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/iser: Add dependency on INFINIBAND_ADDR_TRANS
  IPoIB: Do not join broadcast group if interface is brought down
  RDMA/nes: Fix for NIPQUAD removal
  IPoIB: Fix loss of connectivity after bonding failover on both sides
  IB/mlx4: Don't register IB device for adapters with no IB ports
  mlx4_core: Fix warning from min()
  IB/ehca: spin_lock_irqsave() takes an unsigned long
2009-01-13 08:19:42 -08:00
Roland Dreier
8c9ea7fe96 Merge branches 'ehca', 'ipoib', 'iser', 'mlx4' and 'nes' into for-next 2009-01-12 19:37:31 -08:00
Randy Dunlap
f5eb3b7600 IB/iser: Add dependency on INFINIBAND_ADDR_TRANS
Fix ib_iser build to depend on INFINIBAND_ADDR_TRANS; if INET=y but
IPV6=n, then the RDMA CM is not built but INFINIBAND_ISER can be
enabled, leading to:

    ERROR: "rdma_destroy_id" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
    ERROR: "rdma_connect" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
    ERROR: "rdma_destroy_qp" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
    ERROR: "rdma_create_id" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
    ERROR: "rdma_create_qp" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
    ERROR: "rdma_resolve_route" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
    ERROR: "rdma_disconnect" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!
    ERROR: "rdma_resolve_addr" [drivers/infiniband/ulp/iser/ib_iser.ko] undefined!

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
2009-01-12 19:30:41 -08:00
Yossi Etigin
50df48f59d IPoIB: Do not join broadcast group if interface is brought down
Because the ipoib_workqueue is not flushed when ipoib interface is
brought down, ipoib_mcast_join() may trigger a join to the broadcast
group after priv->broadcast was set to NULL (during cleanup).  This
will cause the system to be a member of the broadcast group when
interface is down.  As a side effect, this breaks the optimization of
setting the Q_key only when joining the broadcast group.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-12 19:28:42 -08:00
Yossi Etigin
a50df398cd IPoIB: Fix loss of connectivity after bonding failover on both sides
Fix bonding failover in the case both peers failover and the
gratuitous ARP is lost.  In that case, the sender side will create an
ipoib_neigh and issue a path request with the old GID first.  When
skb->dst->neighbour->ha changes due to ARP refresh, this ipoib_neigh
will not be added to the path->list of the path of the new GID,
because the ipoib_neigh already exists.  It will not have an AH
either, because of sender-side failover.  Therefore, it will not get
an AH when the path is resolved.

The solution here is to compare GIDs in ipoib_start_xmit() even if
neigh->ah is invalid.  Comparing with an uninitialized value of
neigh->dgid should be fine, since a spurious match is harmless (and
astronomically unlikely too).

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-09 14:05:11 -08:00
Kay Sievers
d927e38c6c infiniband: struct device - replace bus_id with dev_name(), dev_set_name()
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:39 -08:00
Mike Christie
2ff79d52d5 [SCSI] libiscsi: pass opcode into alloc_pdu callout
We do not need to allocate a itt for data_out, so this
passes the opcode to the alloc_pdu callout.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-12-29 11:24:23 -06:00
Mike Christie
0f9c7449ce [SCSI] iser: convert iser to new alloc_pdu api
This just converts iser to new alloc_pdu api. It still
preallocates the pdu, so there is no difference.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-12-29 11:24:20 -06:00
Linus Torvalds
0191b625ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
  net: Allow dependancies of FDDI & Tokenring to be modular.
  igb: Fix build warning when DCA is disabled.
  net: Fix warning fallout from recent NAPI interface changes.
  gro: Fix potential use after free
  sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
  sfc: When disabling the NIC, close the device rather than unregistering it
  sfc: SFT9001: Add cable diagnostics
  sfc: Add support for multiple PHY self-tests
  sfc: Merge top-level functions for self-tests
  sfc: Clean up PHY mode management in loopback self-test
  sfc: Fix unreliable link detection in some loopback modes
  sfc: Generate unique names for per-NIC workqueues
  802.3ad: use standard ethhdr instead of ad_header
  802.3ad: generalize out mac address initializer
  802.3ad: initialize ports LACPDU from const initializer
  802.3ad: remove typedef around ad_system
  802.3ad: turn ports is_individual into a bool
  802.3ad: turn ports is_enabled into a bool
  802.3ad: make ntt bool
  ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
  ...

Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
2008-12-28 12:49:40 -08:00
Neil Horman
908a7a16b8 net: Remove unused netdev arg from some NAPI interfaces.
When the napi api was changed to separate its 1:1 binding to the net_device
struct, the netif_rx_[prep|schedule|complete] api failed to remove the now
vestigual net_device structure parameter.  This patch cleans up that api by
properly removing it..

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-22 20:43:12 -08:00
David Disseldorp
bba7ebba3b IB/iser: Avoid recv buffer exhaustion caused by unexpected PDUs
iSCSI/iSER targets may send PDUs without a prior request from the
initiator.  RFC 5046 refers to these PDUs as "unexpected".  NOP-In PDUs
with itt=RESERVED and Asynchronous Message PDUs occupy this category.

The amount of active "unexpected" PDU's an iSER target may have at any
time is governed by the MaxOutstandingUnexpectedPDUs key, which is not
yet supported.

Currently when an iSER target sends an "unexpected" PDU, the
initiators recv buffer consumed by the PDU is not replaced.  If over
initial_post_recv_bufs_num "unexpected" PDUs are received then the
receive queue will run out of receive work requests entirely.

This patch ensures recv buffers consumed by "unexpected" PDUs are
replaced in the next iser_post_receive_control() call.

Signed-off-by: David Disseldorp <ddiss@sgi.com>
Signed-off-by: Ken Sandars <ksandars@sgi.com>
Acked-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-12-21 13:56:50 -08:00
David S. Miller
198d6ba4d7 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/isdn/i4l/isdn_net.c
	fs/cifs/connect.c
2008-11-18 23:38:23 -08:00
Yossi Etigin
ff79ae8083 IPoIB: Fix crash in path_rec_completion()
Fix a crash in path_rec_completion() during an SM up/down loop.  If
more than one path record request is issued, the first completion
releases path->done, allowing ipoib_flush_paths() to free the path,
and thus corrupting it for the second completion.

Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") added the field path->valid and changed the test "if
(!path)" to "if (!path || !path->valid)".  This change made it
possible for a path with an outstanding query to pass the test and
issue another query on the same path.  Having two queries on the same
path leads to a crash.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1325>.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-11-12 10:24:39 -08:00
Yossi Etigin
93a3ab939b IPoIB: Fix hang in ipoib_flush_paths()
ipoib_flush_paths() can hang during an SM up/down loop: if
path_rec_start() fails (for instance, because there is no sm_ah), the
path is still added to the path list by neigh_add_path().  Then,
ipoib_flush_paths() will wait for path->done, but it will never
complete because the request was not issued at all.  Fix this by
completing path->done if issuing the query fails.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1329>.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-11-12 10:24:38 -08:00
Yossi Etigin
fe25c56190 IPoIB: Don't enable NAPI when it's already enabled
If a P_Key is not present when an interface is created, ipoib_open()
will return after doing napi_enable().  ipoib_open() will be called
again from ipoib_pkey_poll() when the P_Key appears, after NAPI has
already been enabled, and try to enable it again. This triggers a
BUG_ON() in napi_enable().

Fix this by moving the call to napi_enable() to after the test for
P_Key presence.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-11-12 10:24:36 -08:00
Harvey Harrison
63779436ab drivers: replace NIPQUAD()
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31 00:56:00 -07:00
Harvey Harrison
5b095d9892 net: replace %p6 with %pI6
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 12:52:50 -07:00
Harvey Harrison
8c165a8383 infiniband: remove IPOIB_GID_RAW_ARG, IPOIB_GID_ARG, IPOIB_GID_FMT
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 23:02:37 -07:00
Harvey Harrison
fcace2fe7a infiniband: ipoib replace IPOIB_GID_FMT with %p6
Replace all uses of IPOIB_GID_FMT, IPOIB_GID_RAW_ARG() and IPOIB_GID_ARG()

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 23:02:36 -07:00
Harvey Harrison
8867cd7c86 infiniband: use %p6 for printing message ids
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 23:02:35 -07:00
Linus Torvalds
724bdd097e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/ehca: Reject dynamic memory add/remove when ehca adapter is present
  IB/ehca: Fix reported max number of QPs and CQs in systems with >1 adapter
  IPoIB: Set netdev offload features properly for child (VLAN) interfaces
  IPoIB: Clean up ethtool support
  mlx4_core: Add Ethernet PCI device IDs
  mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC
  mlx4_core: Multiple port type support
  mlx4_core: Ethernet MAC/VLAN management
  mlx4_core: Get ethernet MTU and default address from firmware
  mlx4_core: Support multiple pre-reserved QP regions
  Update NetEffect maintainer emails to Intel emails
  RDMA/cxgb3: Remove cmid reference on tid allocation failures
  IB/mad: Use krealloc() to resize snoop table
  IPoIB: Always initialize poll_timer to avoid crash on unload
  IB/ehca: Don't allow creating UC QP with SRQ
  mlx4_core: Add QP range reservation support
  RDMA/ucma: Test ucma_alloc_multicast() return against NULL, not with IS_ERR()
2008-10-23 08:16:03 -07:00
Or Gerlitz
83bb63f62b IPoIB: Set netdev offload features properly for child (VLAN) interfaces
Child devices were created without any offload features set, fix this by
moving the code that computes the features into generic function which is
now called through non-child and child device creation.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

-- v1 has a bug where the 'result' flag in ipoib_vlan_add may be used uninitialized
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-22 15:49:49 -07:00
Or Gerlitz
70c9c0db54 IPoIB: Clean up ethtool support
Add a get_rx_csum method.  Remove the driver's own get_tso method, as
the ethtool kernel code uses the default one if nothing is provided.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-22 15:49:29 -07:00
Linus Torvalds
ed09441dac Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (39 commits)
  [SCSI] sd: fix compile failure with CONFIG_BLK_DEV_INTEGRITY=n
  libiscsi: fix locking in iscsi_eh_device_reset
  libiscsi: check reason why we are stopping iscsi session to determine error value
  [SCSI] iscsi_tcp: return a descriptive error value during connection errors
  [SCSI] libiscsi: rename host reset to target reset
  [SCSI] iscsi class: fix endpoint id handling
  [SCSI] libiscsi: Support drivers initiating session removal
  [SCSI] libiscsi: fix data corruption when target has to resend data-in packets
  [SCSI] sd: Switch kernel printing level for DIF messages
  [SCSI] sd: Correctly handle all combinations of DIF and DIX
  [SCSI] sd: Always print actual protection_type
  [SCSI] sd: Issue correct protection operation
  [SCSI] scsi_error: fix target reset handling
  [SCSI] lpfc 8.2.8 v2 : Add statistical reporting control and additional fc vendor events
  [SCSI] lpfc 8.2.8 v2 : Add sysfs control of target queue depth handling
  [SCSI] lpfc 8.2.8 v2 : Revert target busy in favor of transport disrupted
  [SCSI] scsi_dh_alua: remove REQ_NOMERGE
  [SCSI] lpfc 8.2.8 : update driver version to 8.2.8
  [SCSI] lpfc 8.2.8 : Add MSI-X support
  [SCSI] lpfc 8.2.8 : Update driver to use new Host byte error code DID_TRANSPORT_DISRUPTED
  ...
2008-10-17 09:00:23 -07:00
Steven Whitehouse
a447c09324 vfs: Use const for kernel parser table
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.

This was posted for review some time ago and I believe its been in -mm
since then.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-13 10:10:37 -07:00
Mike Christie
8e12452549 [SCSI] libiscsi: rename host reset to target reset
I had this in my patchset to add target reset support, but
it got dropped due to patching conflicts. This initial patch
just renames the function and users. We are actually just
dropping the session, and so this does not have anything to do
with the host exactly. It does for software iscsi because
we allocate a host per session, but for cxgb3i this makes no
sense.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-10-13 09:28:59 -04:00
Mike Christie
e5bd7b54e9 [SCSI] libiscsi: Support drivers initiating session removal
If the driver knows when hardware is removed like with cxgb3i,
bnx2i, qla4xxx and iser then we will want to remove the sessions/devices
that are bound to that device before removing the host.

cxgb3i and in the future bnx2i will remove the host and that will
remove all the sessions on the hba. iser can call iscsi_kill_session
when it gets an event that indicates that a hca is removed.
And when qla4xxx is hooked in to the lib (it is only hooked into
the class right now) it can call iscsi remove host like the
partial offload card drivers.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-10-13 09:28:59 -04:00
Roland Dreier
2767840a5c IPoIB: Always initialize poll_timer to avoid crash on unload
ipoib_ib_dev_stop() does del_timer_sync(&priv->poll_timer), but if a
P_key for an interface is not found, poll_timer is not initialized, so
this leads to a crash or hang.  Fix this by moving where poll_timer is
initialized to ipoib_ib_dev_init(), which is always called.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1172>.

Debugged-by: Yosef Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-10 15:58:52 -07:00
Roland Dreier
943c246e9b IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX
Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling
tx_lock.  Not only do we want to get rid of LLTX, this actually causes
problems because of the skb_orphan() done with this tx_lock held: some
skb destructors expect to be run with interrupts enabled.

The simplest fix for this is to get rid of the driver-private tx_lock
and stop using LLTX.  We kill off priv->tx_lock and use
netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit
tricky because we need to update places that take priv->lock inside
the tx_lock to disable IRQs, rather than relying on tx_lock having
already disabled IRQs.

Also, there are a couple of places where we need to disable BHs to
make sure we have a consistent context to call netif_tx_lock() (since
we no longer can use _irqsave() variants), and we also have to change
ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather
than directly, because ipoib_send_comp_handler() runs in interrupt
context and drain_tx_cq() must run in BH context so it can call
netif_tx_lock().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-30 10:36:21 -07:00
Roland Dreier
c9da4bad5b IPoIB: Fix crash when path record fails after path flush
Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") changed how paths are flushed on an SM event.  This
change introduces a problem if the path record query triggered by
fails, causing path->ah to become NULL.  A later successful path query
will then trigger WARN_ON() in path_rec_completion(), and crash
because path->ah has already been freed, so the ipoib_put_ah() inside
the lock in path_rec_completion() may actually drop the last reference
(contrary to the comment that claims this is safe).

Fix this by updating path->ah and freeing old_ah only when the path
record query is successful.  This prevents the neighbour AH and that
path AH from getting out of sync.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1194>

Reported-by: Rabah Salem <ravah@mellanox.com>
Debugged-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-25 15:26:15 -07:00
Yossi Etigin
e8224e4b80 IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop()
Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
ipoib_stop().  We avoid it by scheduling the piece of code that takes
the lock on ipoib_workqueue instead of executing it directly.  This
works because we only flush the ipoib_workqueue with the RTNL not held.

The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
which calls ipoib_mcast_leave(). The latter calls
ib_sa_free_multicast(), and this waits until the multicast completion
handler finishes.  This handler is ipoib_mcast_join_complete(), which
waits for the rtnl_lock(), which was already taken by ipoib_stop().

This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on
RTNL in ipoib_stop()").

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-16 11:57:45 -07:00
Adrian Bunk
7a8fc9b248 removed unused #include <linux/version.h>'s
This patch lets the files using linux/version.h match the files that
#include it.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-23 12:14:12 -07:00
Roland Dreier
a77a57a1a2 IPoIB: Fix deadlock on RTNL in ipoib_stop()
Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue.  However, ipoib_stop() (which is run
inside rtnl_lock()) flushes this workqueue, which leads to a deadlock
if the join task is pending.

Fix this by simply not flushing the workqueue from ipoib_stop().  It
turns out that we really don't care about workqueue tasks running
during or after ipoib_stop(), as long as we make sure to flush the
workqueue before unregistering a netdev.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-19 15:01:32 -07:00
David J. Wilder
b1404069f6 IPoIB/cm: Use vmalloc() to allocate rx_rings
There are users that are running UDP applications that require a large
receive queue size in order to get good performance.  To prevent
allocation failures for rx_rings when using non-SRQ mode and large
recv_queue_size (1K or larger), use vmalloc() instead of kcalloc() to
alocate rx_rings.

Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-08 15:51:29 -07:00
Roland Dreier
e08198169e IPoIB/cm: Set correct SG list in ipoib_cm_init_rx_wr()
wr->sg_list should be set to the sge pointer passed in, not
priv->cm.rx_sge.

Reported-by: Hoang-Nam Nguyen <HNGUYEN@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-30 07:21:46 -07:00
Roland Dreier
9905922446 IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG
The help text for INFINIBAND_IPOIB_DEBUG refers to "ipoib_debugfs,"
which no longer exists.  Correct this to talk about the files under
debugfs that are really created.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:37:25 -07:00
Roland Dreier
99c3a5a9e3 IPoIB/cm: Connected mode is no longer EXPERIMENTAL
Connected mode is now tested and used by lots of people.  No need to
hide it under CONFIG_EXPERIMENTAL.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:37:25 -07:00
Roland Dreier
2cc177364e Merge branches 'bkl-removal', 'cma', 'ehca', 'for-2.6.27', 'mlx4', 'mthca' and 'nes' into for-linus 2008-07-24 08:38:47 -07:00
Or Gerlitz
01b3fc8b15 IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures
Print the return code of ib_sa_path_rec_get() if it fails to help
debug errors.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:34 -07:00
Or Gerlitz
2f5de15128 IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event
Enhance iser to act upon notification on network stack changes that
make its RDMA connection unaligned with the link used by the stack for
the <src,dst> IPs used to establish the connection.

When RDMA_CM_EVENT_ADDR_CHANGE arrives, just disconnect the
connection, assuming that the user space iscsid daemon will reconnect,
and the new connection will be aligned with the IP stack.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:16:21 -07:00
David S. Miller
49997d7515 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	Documentation/powerpc/booting-without-of.txt
	drivers/atm/Makefile
	drivers/net/fs_enet/fs_enet-main.c
	drivers/pci/pci-acpi.c
	net/8021q/vlan.c
	net/iucv/iucv.c
2008-07-18 02:39:39 -07:00
Linus Torvalds
89a93f2f48 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (102 commits)
  [SCSI] scsi_dh: fix kconfig related build errors
  [SCSI] sym53c8xx: Fix bogus sym_que_entry re-implementation of container_of
  [SCSI] scsi_cmnd.h: remove double inclusion of linux/blkdev.h
  [SCSI] make struct scsi_{host,target}_type static
  [SCSI] fix locking in host use of blk_plug_device()
  [SCSI] zfcp: Cleanup external header file
  [SCSI] zfcp: Cleanup code in zfcp_erp.c
  [SCSI] zfcp: zfcp_fsf cleanup.
  [SCSI] zfcp: consolidate sysfs things into one file.
  [SCSI] zfcp: Cleanup of code in zfcp_aux.c
  [SCSI] zfcp: Cleanup of code in zfcp_scsi.c
  [SCSI] zfcp: Move status accessors from zfcp to SCSI include file.
  [SCSI] zfcp: Small QDIO cleanups
  [SCSI] zfcp: Adapter reopen for large number of unsolicited status
  [SCSI] zfcp: Fix error checking for ELS ADISC requests
  [SCSI] zfcp: wait until adapter is finished with ERP during auto-port
  [SCSI] ibmvfc: IBM Power Virtual Fibre Channel Adapter Client Driver
  [SCSI] sg: Add target reset support
  [SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC
  [SCSI] sd: Move scsi_disk() accessor function to sd.h
  ...
2008-07-15 18:58:04 -07:00
David S. Miller
b9e4085768 netdev: Do not use TX lock to protect address lists.
Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:15:08 -07:00
David S. Miller
e308a5d806 netdev: Add netdev->addr_list_lock protection.
Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:13:44 -07:00
Eli Cohen
bc3a290b51 IPoIB: Double default RX/TX ring sizes
Increase IPoIB ring sizes to twice their original sizes (RX: 128->256,
TX: 64->128) to act as a shock absorber for high traffic peaks.  With
the current settings, we have seen cases that there are many calls to
netif_stop_queue(), which causes degradation in throughput.  Also,
larger receive buffer sizes help IPoIB in CM mode to avoid experiencing
RNR NAK conditions due to insufficient receive buffers at the SRQ.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:52 -07:00
Eli Cohen
e112373fd6 IPoIB/cm: Reduce connected mode TX object size
Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA
mapping per send, so we don't need a mapping[] array.  Define a new
struct with a single u64 mapping member and use it for the CM tx_ring.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:52 -07:00
Eli Cohen
bd3606715e IPoIB: Use dev_set_mtu() to change mtu
When the driver sets the MTU of the net device outside of its
change_mtu method, it should make use of dev_set_mtu() instead of
directly setting the mtu field of struct netdevice.  Otherwise
functions registered to be called upon MTU change will not get called
(this is done through call_netdevice_notifiers() in dev_set_mtu()).

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:51 -07:00
Eli Cohen
c8c2afe360 IPoIB: Use rtnl lock/unlock when changing device flags
Use of this lock is required to synchronize changes to the netdvice's
data structs.  Also move the call to ipoib_flush_paths() after the
modification of the netdevice flags in set_mode().

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:51 -07:00
Roland Dreier
9eae554c17 IPoIB: Get rid of ipoib_mcast_detach() wrapper
ipoib_mcast_detach() does nothing except call ib_detach_mcast(), so just
use the core API in the one place that does a multicast group detach.

add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-105 (-105)
function                                     old     new   delta
ipoib_mcast_leave                            357     319     -38
ipoib_mcast_detach                            67       -     -67

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00
Eli Cohen
d0de13622d IPoIB: Only set Q_Key once: after joining broadcast group
The current code will set the Q_Key for any join of a non-sendonly
multicast group.  The operation involves a modify QP operation, which
is fairly heavyweight, and is only really required after the join of
the broadcast group.  Fix this by adding a parameter to ipoib_mcast_attach()
to control when the Q_Key is set.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00
Eli Cohen
5892eff91a IPoIB: Remove priv->mcast_mutex
No need for a mutex around calls to ib_attach_mcast/ib_detach_mcast
since these operations are synchronized at the HW driver layer.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00
Eli Cohen
c03d4731b5 IPoIB: Remove unused IPOIB_MCAST_STARTED code
The IPOIB_MCAST_STARTED flag is not used at all since commit b3e2749b
("IPoIB: Don't drop multicast sends when they can be queued"), so
remove it.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00
Moni Shoua
ee1e2c82c2 IPoIB: Refresh paths instead of flushing them on SM change events
The patch tries to solve the problem of device going down and paths being
flushed on an SM change event. The method is to mark the paths as candidates for
refresh (by setting the new valid flag to 0), and wait for an ARP
probe a new path record query.

The solution requires a different and less intrusive handling of SM
change event. For that, the second argument of the flush function
changes its meaning from a boolean flag to a level.  In most cases, SM
failover doesn't cause LID change so traffic won't stop.  In the rare
cases of LID change, the remote host (the one that hadn't changed its
LID) will lose connectivity until paths are refreshed. This is no
worse than the current state.  In fact, preventing the device from
going down saves packets that otherwise would be lost.

Signed-off-by: Moni Levy <monil@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:49 -07:00
Vladimir Sokolovsky
af40da894e IPoIB: add LRO support
Add "ipoib_use_lro" module parameter to enable LRO and an
"ipoib_lro_max_aggr" module parameter to set the max number of packets
to be aggregated.  Make LRO controllable and LRO statistics accessible
through ethtool.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Ron Livne
1240673405 IPoIB: Use multicast loopback blocking if available
Set IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK for IPoIB's UD QPs if
supported by the underlying device.  This creates an improvement of up
to 39% in bandwidth when sending multicast packets with IPoIB, and an
improvment of 12% in cpu usage.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Roland Dreier
a7d834c4bc IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq()
For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is
called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(),
and these two callers are not synchronized against each other.
However, ipoib_cm_post_receive_nonsrq() always reuses the same receive
work request and scatter list structures, so multiple callers can end
up stepping on each other, which leads to posting garbled work
requests.

Fix this by having the caller pass in the ib_recv_wr and ib_sge
structures to use, and allocating new local structures in
ipoib_cm_nonsrq_init_rx().

Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and
David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam
Nguyen <hnguyen@de.ibm.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:47 -07:00
Eli Cohen
f89271da32 IPoIB: Copy small received SKBs in connected mode
The connected mode implementation in the IPoIB driver has a large
overhead in the way SKBs are handled in the receive flow.  It usually
allocates an SKB with as big as was used in the currently received SKB
and moves unused fragments from the old SKB to the new one. This
involves a loop on all the remaining fragments and incurs overhead on
the CPU.  This patch, for small SKBs, allocates an SKB just large
enough to contain the received data and copies to it the data from the
received SKB.  The newly allocated SKB is passed to the stack and the
old SKB is reposted.

When running netperf, UDP small messages, without this pach I get:

    UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    14.4.3.178 (14.4.3.178) port 0 AF_INET
    Socket  Message  Elapsed      Messages
    Size    Size     Time         Okay Errors   Throughput
    bytes   bytes    secs            #      #   10^6bits/sec

    114688     128   10.00     5142034      0     526.31
    114688           10.00     1130489            115.71

With this patch I get both send and receive at ~315 mbps.

The reason that send performance actually slows down is as follows:
When using this patch, the overhead of the CPU for handling RX packets
is dramatically reduced.  As a result, we do not experience RNR NAK
messages from the receiver which cause the connection to be closed and
reopened again; when the patch is not used, the receiver cannot handle
the packets fast enough so there is less time to post new buffers and
hence the mentioned RNR NACKs.  So what happens is that the
application *thinks* it posted a certain number of packets for
transmission but these packets are flushed and do not really get
transmitted.  Since the connection gets opened and closed many times,
each time netperf gets the CPU time that otherwise would have been
given to IPoIB to actually transmit the packets.  This can be verified
when looking at the port counters -- the output of ifconfig and the
oputput of netperf (this is for the case without the patch):

    tx packets
    ==========
    port counter:   1,543,996
    ifconfig:       1,581,426
    netperf:        5,142,034

    rx packets
    ==========
    netperf         1,1304,089

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
2008-07-14 23:48:44 -07:00
Roland Dreier
f3781d2e89 RDMA: Remove subversion $Id tags
They don't get updated by git and so they're worse than useless.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Roland Dreier
969a60f9db IB/srp: Remove use of cached P_Key/GID queries
The SRP initiator is currently using ib_find_cached_pkey() and
ib_get_cached_gid() in situations where the uncached ib_find_pkey()
and ib_query_gid() functions serve just as well: sleeping is allowed
and performance is not an issue.  Since we want to eliminate the
cached operations in the long term, convert SRP to use the uncached
variants.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Mike Christie
8e9a20cee4 [SCSI] libiscsi, iscsi_tcp, ib_iser: fix setting of can_queue with old tools.
This patch fixes two bugs that are related.

1. Old tools did not set can_queue/cmds_max. This patch modifies
libiscsi so that when we add the host we catch this and set it
to the default.

2. iscsi_tcp thought that the scsi command that was passed to
the eh functions needed a iscsi_cmd_task allocated for it. It
only needed a mgmt task, and now it does not matter since it
all comes from the same pool and libiscsi handles this for the
drivers. ib_iser had copied iscsi_tcp's code and set can_queue
to its max - 1 to handle this. So this patch removes the max -1,
and just sets it to the max.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:29 -05:00
Mike Christie
913e5bf435 [SCSI] libiscsi, iser, tcp: remove recv_lock
The recv lock was defined so the iscsi layer could block
the recv path from processing IO during recovery. It
turns out iser just set a lock to that pointer which was pointless.

We now disconnect the transport connection before doing recovery
so we do not need the recv lock. For iscsi_tcp we still stop
the recv path incase older tools are being used.

This patch also has iscsi_itt_to_ctask user grab the session lock
and has the caller access the task with the lock or get a ref
to it in case the target is broken and sends a tmf success response
then sends data or a response for the command that was supposed to
be affected bty the tmf.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:22 -05:00
Mike Christie
88dfd340b9 [SCSI] iscsi class: Add session initiatorname and ifacename sysfs attrs.
This adds two new attrs used for creating initiator ports and
binding sessions to hardware.

The session level initiatorname:

Since bnx2i does a scsi_host per host device, we need to add the
iface initiator port settings on the session, so we can create
multiple initiator ports (each with different inames) per device/scsi_host.

The current iname reflects that qla4xxx can have one iname per hba, and we are
allocating a host per session for software. The iname on the host will
remain so we can export and set the hba level qla4xxx setting.

The ifacename attr:

To bind a session to a some peice of hardware in userspace we maintain
some mappings, but during boot or iscsid restart (iscsid contains the user
space part of the driver) we need to be able to figure out which of those
host mappings abstractions maps to certain sessions. This patch adds
a ifacename attr, which userspace can set to id the host side of the
endpoint across pivot_roots and iscsid restarts.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:21 -05:00
Mike Christie
412eeafa0a [SCSI] iser: Modify iser to take a iscsi_endpoint struct in ep callouts and session setup
This hooks iser into the iscsi endpoint code. Previously it handled the
lookup and allocation. This has been made generic so bnx2i and iser can
share it. It also allows us to pass iser the leading conn's ep, so we
know the ib_deivce being used and can set it as the scsi_host's parent.
And that allows scsi-ml to set the dma_mask based on those values.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:21 -05:00
Mike Christie
7970634b81 [SCSI] iscsi class: user device_for_each_child instead of duplicating session list
Currently we duplicate the list of sessions, because we were using the
test for if a session was on the host list to indicate if the session
was bound or unbound. We can instead use the target_id and fix up
the class so that drivers like bnx2i do not have to manage the target id
space.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:20 -05:00
Mike Christie
2261ec3d68 [SCSI] iser: handle iscsi_cmd_task rename
This handles the iscsi_cmd_task rename and renames
the iser cmd task to iser task.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:20 -05:00
Mike Christie
2747fdb257 [SCSI] iser: convert ib_iser to support merged tasks
Convert ib_iser to support merged tasks.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:19 -05:00
Mike Christie
0af967f5d4 [SCSI] libiscsi, iscsi_tcp, iser: add session cmds array accessor
Currently to get a ctask from the session cmd array, you have to
know to use the itt modifier. To make this easier on LLDs and
so in the future we can easilly kill the session array and use
the host shared map instead, this patch adds a nice wrapper
to strip the itt into a session->cmds index and return a ctask.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:18 -05:00
Mike Christie
b40977d95f [SCSI] iser: fix handling of scsi cmnds during recovery.
After the stop_conn callback has returned the LLD should not
touch the scsi cmds. iscsi_tcp and libiscsi use the
conn->recv_lock and suspend_rx field to halt recv path
processing, but iser does not have any protection.

This patch modifies iser so that userspace can just
call the ep_disconnect callback, which will halt
all recv IO, before calling the stop_conn callback so
we do not have to worry about the conn->recv_lock and
suspend rx field. iser just needs to stop the send side
from accessing the ib conn.

Fixup to handle when the ep poll fails and ep disconnect
is called from Erez.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:17 -05:00
Mike Christie
5d91e209fb [SCSI] iscsi: remove session/conn_data_size from iscsi_transport
This removes the session and conn data_size fields from the iscsi_transport.
Just pass in the value like with host allocation. This patch also makes
it so the LLD iscsi_conn data is allocated with the iscsi_cls_conn.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:16 -05:00
Mike Christie
a4804cd6eb [SCSI] iscsi: add iscsi host helpers
This finishes the host/session unbinding, by adding some helpers
to add and remove hosts and the session they manage.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:16 -05:00
Mike Christie
756135215e [SCSI] iscsi: remove session and host binding in libiscsi
bnx2i allocates a host per netdevice but will use libiscsi,
so this unbinds the session from the host in that code.

This will also be useful for the iser parent device dma settings
fixes.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:16 -05:00
Mike Christie
d3826721b1 [SCSI] iscsi class, iscsi drivers: remove unused iscsi_transport attrs
max_cmd_len and max_conn are not really used. max_cmd_len is
always 16 and can be set by the LLD. max_conn is always one
since we do not support MCS.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:15 -05:00
Mike Christie
40753caa36 [SCSI] iscsi class, iscsi_tcp/iser: add host arg to session creation
iscsi offload (bnx2i and qla4xx) allocate a scsi host per hba,
so the session creation path needs a shost/host_no argument.
Software iscsi/iser will follow the same behabior as before
where it allcoates a host per session, but in the future iser
will probably look more like bnx2i where the host's parent is
the hardware (rnic for iser and for bnx2i it is the nic), because
it does not use a socket layer like how iscsi_tcp does.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:15 -05:00
Jack Morgenstein
e1d50dce5a IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish()
We saw a kernel oops in our regression testing when a multicast "join
finish" occurred just after the interface was -- this is
<https://bugs.openfabrics.org/show_bug.cgi?id=1040>.  The test
randomly causes the HCA physical port to go down then up.

The cause of this is that ipoib_mcast_join_finish() processing happen
just after ipoib_mcast_dev_flush() was invoked (in which case the
broadcast pointer is NULL).  This patch tests for and handles the case
where priv->broadcast is NULL.

Cc: <stable@kernel.org>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-20 15:41:09 -07:00
Eli Cohen
57ce41d1d1 IB/ipoib: Fix transmit queue stalling forever
Commit f56bcd80 ("IPoIB: Use separate CQ for UD send completions")
introduced a bug where the transmit queue could get stopped and never
woken up.  The problem is that send completions are only polled at the
end of the xmit function, so if the send queue fills up and the xmit
path stops the queue, then there is no way for send completions to
ever get polled, and so the transmit queue stays stopped forever.

Fix this by arming the send CQ just before posting the last send
request that fills the send queue.  Then, when the completion event
handler is called, drain the send CQ.  Since it is possible that not
enough send completions are in the CQ, verify that the the net queue
has been woken up after draining the send CQ, and if not arm a timer
and drain again at the timer function.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-30 20:02:45 -07:00
Eli Cohen
b4132efa1a IPoIB: Copy child MTU from parent
When creating a child interface, copy the MTU information from the
parent.  Otherwise when the child's multicast join completes, the MTU
will not be updated since the code does

	dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);

and priv->admin_mtu will be set to 0.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Eli Cohen
f56bcd8013 IPoIB: Use separate CQ for UD send completions
Use a dedicated CQ for UD send completions. Also, do not arm the UD
send CQ, which reduces the number of interrupts generated.  This patch
farther reduces overhead by not calling poll CQ for every posted send
WR -- it does polls only when there 16 or more outstanding work requests.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Eli Dorfman
87528227df IB/iser: Count FMR alignment violations per session
Count FMR alignment violations per session as part of the iscsi
statistics.

Signed-off-by: Eli Dorfman <elid@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Eli Dorfman
6f735e36ba IB/iser: Move high-volume debug output to higher debug level
Add another level for debug.

Signed-off-by: Eli Dorfman <elid@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Shirley Ma
bc7b3a36ba IPoIB: Handle 4K IB MTU for UD (datagram) mode
This patch enables IPoIB to use 4K UD messages (when the underlying
device and fabrics support a 4K MTU) by using two scatter buffers when
PAGE_SIZE is less than or equal to thhe HCA IB MTU size.  The first
buffer is for IPoIB header + GRH header, and the second buffer is the
IPoIB payload, which is 4K-4.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Tony Jones
ee959b00c3 SCSI: convert struct class_device to struct device
It's big, but there doesn't seem to be a way to split it up smaller...

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:33 -07:00
Greg Kroah-Hartman
0532193746 IB: rename "dev" to "srp_dev" in srp_host structure
This sets us up to be able to convert the srp_host to use a struct
device instead of a class_device.

Based on a original patch from Tony Jones, but split up into this piece
by Greg.

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:30 -07:00
Erez Zilber
0a22ab92f5 IB/iser: Don't change itt endianness
The itt field in struct iscsi_data is not defined with any particular
endianness.  open-iscsi should use it as-is without byte-swapping it.
This fixes sparse warnings coming from doing ntohl(hdr->itt).

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Roland Dreier
9fdd5e5bf6 IPoIB: Handle case when P_Key is deleted and re-added at same index
If a P_Key is deleted and then re-added at the same index, then IPoIB
gets confused because __ipoib_ib_dev_flush() only checks whether the
index is the same without checking whether the P_Key was present, so
the interface is stopped when the P_Key is deleted, but the event when
the P_Key is re-added gets ignored and the interface never gets
restarted.

Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey()
everywhere in IPoIB, since none of the places that look for P_Keys are
in a fast path or in non-sleeping context, and in general we want to
kill off the whole caching infrastructure eventually.  This also fixes
consistency problems caused because some IPoIB queries were cached and
some were uncached during the window where the cache was not updated.

Thanks to Venkata Subramonyam <vsubramo@cisco.com> for debugging this
problem and testing this fix.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Erez Zilber
d97c51707d IB/iser: Release connection resources on RDMA_CM_EVENT_DEVICE_REMOVAL event
When a RDMA_CM_EVENT_DEVICE_REMOVAL event is raised, iSER should
release the connection resources.

This is necessary when the IB HCA module is unloaded while open-iscsi
is still running.  Currently, iSER just BUG()s.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Eli Cohen
28d52b3cd8 IPoIB: Support modifying IPoIB CQ event moderation
This can be used to tune at run time the parameters controlling the
event (interrupt) generation rate and thus reduce the overhead
incurred by handling interrupts resulting in better throughput.  Since
IPoIB uses a single CQ for both RX and TX, RX is chosen to dictate
configuration for both RX and TX.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Eli Cohen
82c24c18af IPoIB: Add basic ethtool support
Just add the infrastructure so we can add functionality later.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Eli Cohen
40ca1988e0 IPoIB: Add LSO support
For HCAs that support TCP segmentation offload (IB_DEVICE_UD_TSO), set
NETIF_F_TSO and use HW LSO to offload TCP segmentation.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Robert P. J. Day
157de22946 IB: Use shorter list_splice_init() for brevity
Convert list_splice() + INIT_LIST_HEAD() to the equivalent list_splice_init()

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
David Dillow
1e89a1946c IB/srp: Enforce protocol limit on srp_sg_tablesize
The current SRP initiator will allow unlimited s/g entries in the
indirect descriptors lists, but the entry count field in the SRP_CMD
request is 8 bits, so setting srp_sg_tablesize too large will open the
possibility of wrapping the count and generating invalid requests.

Clamp srp_sg_tablesize to the protocol limits to prevent surprises.

Reported by Martin W. Schlining III <mschlining@datadirectnet.com>.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:12 -07:00
Eli Cohen
6046136c74 IPoIB: Use checksum offload support if available
For HCAs that support checksum offload (ie that set IB_DEVICE_UD_IP_CSUM
in the device capabilities flags), have IPoIB set NETIF_F_IP_CSUM and
use the HCA to generate and verify IP checksums.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:10 -07:00
Roland Dreier
10313cbb92 IPoIB: Allocate priv->tx_ring with vmalloc()
Commit 7143740d ("IPoIB: Add send gather support") made struct
ipoib_tx_buf significantly larger, since the mapping member changed
from a single u64 to an array with MAX_SKB_FRAGS + 1 entries.  This
means that allocating tx_rings with kzalloc() may fail because there
is not enough contiguous memory for the new, much bigger size.  Fix
this regression by allocating the rings with vmalloc() instead.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-12 07:51:03 -07:00
Roland Dreier
4200406b8f IPoIB/cm: Set tx_wr.num_sge in connected mode post_send()
Commit 7143740d ("IPoIB: Add send gather support") made it possible
for tx_wr.num_sge to be != 1 -- this happens if send gather support is
enabled.  However, the code in the connected mode post_send() function
assumes the old invariant, namely that tx_wr.num_sge is always 1.  Fix
this by explicitly setting tx_wr.num_sge to 1 in the CM post_send().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 18:35:20 -07:00
Or Gerlitz
b3e2749bf3 IPoIB: Don't drop multicast sends when they can be queued
When set_multicast_list() is called the multicast task is restarted
and the IPOIB_MCAST_STARTED bit is cleared.  As a result for some
window of time, multicast packets are not transmitted nor queued but
rather dropped by ipoib_mcast_send().  These dropped packets are
painful in two cases:

 - bonding fail-over which both calls set_multicast_list() on the new
   active slave and sends Gratuitous ARP through that slave.

 - IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the
   device and issues IGMP leave.

In both these cases, depending on the scheduling of the IPoIB
multicast task, the packets would be dropped.  As a result, in the
bonding case, the failover would not be detected by the peers until
their neighbour is renewed the neighbour (which takes a few tens of
seconds).  In the IGMP case, the IP router doesn't get an IGMP leave
and would only learn on that from further probes on the group (also a
delay of at least a few tens of seconds).

Fix this by allowing transmission (or queuing) depending on the
IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag.

Signed-off-by: Olga Shern <olgas@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 14:12:03 -07:00
Arne Redlich
d33ed425c6 IB/iser: Handle iser_device allocation error gracefully
"iser_device" allocation failure is "handled" with a BUG_ON() right
before dereferencing the NULL-pointer - fix this!

Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
2008-03-10 21:17:51 -07:00
Arne Redlich
9a378270c0 IB/iser: Fix list iteration bug
The iteration through the list of "iser_device"s during device
lookup/creation is broken -- it might result in an infinite loop if
more than one HCA is used with iSER.  Fix this by using
list_for_each_entry() instead of the open-coded flawed list iteration
code.

Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-10 21:15:49 -07:00
Pradeep Satyanarayana
ec229e5e81 IPoIB/cm: Fix ipoib_cm_dev_stop() cleanup when drain times out
Commit efcd9971 ("IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()")
introduced a bug in ipoib_cm_dev_stop() when the receive drain times
out.  In that case, the function moves all the pending rx stuff into a
private list but then calls ipoib_cm_free_rx_reap_list(), which
handles a different list.

Fix this by moving everything to the rx_reap_list that will actually
get freed up.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=906>.

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-19 10:25:11 -08:00
Eli Cohen
a9d1884925 IPoIB: Remove unused struct ipoib_cm_tx.ibwc member
struct ipoib_cm_tx.ibwc is unused since commit 1b524963 ("IPoIB/cm:
Use common CQ for CM send completions"), so remove it.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
2008-02-14 10:30:50 -08:00
Jack Morgenstein
167c42655c IPoIB: On P_Key change event, reset state properly
In P_Key event handling, if the old P_Key is no longer available, the
driver must call ipoib_ib_dev_stop() -- just as it does when the P_Key
is still available (see procedure __ipoib_ib_dev_flush()).

When a P_Key becomes available, the driver will perform ipoib_open(),
which assumes that the QP is in RESET, the cm_id has been
destroyed/deleted, etc.  If ipoib_ib_dev_stop() is not called as
described above, then these assumptions will be false, and the attempt
to bring the interface up will fail.

Found by Mellanox QA.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-14 10:15:06 -08:00
Eli Cohen
7143740d26 IPoIB: Add send gather support
This patch acts as a preparation for using checksum offload for IB
devices capable of inserting/verifying checksum in IP packets.  The
patch does not actaully turn on NETIF_F_SG - we defer that to the
patches adding checksum offload capabilities.

We only add support for send gathers for datagram mode, since existing
HW does not support checksum offload on connected QPs.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08 14:32:37 -08:00
Eli Cohen
eb14032f9e IPoIB: Add high DMA feature flag
All current InfiniBand devices can handle all DMA addresses, and it's
hard to imagine anyone would be silly enough to build a new device
that couldn't.  Therefore, enable the NETIF_F_HIGHDMA feature for IPoIB.

This has no effect for no, but is needed when we enable gather/scatter
support and checksum stateless offloads.

Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08 13:39:26 -08:00
David Dillow
9fe4bcf45e IB/srp: Retry stale connections
When a host just goes away (crash, power loss, etc.) without tearing
down its IB connections, it can get stale connection errors when it
tries to reconnect to targets upon rebooting.  Retrying the connection
a few times will prevent sysadmins from playing the "which disk(s)
went missing?" game.

This would have made things slightly quicker when tracking down some
of the recent bugs, but it also helps quite a bit when you've got a
large number of targets hanging off a wedged server.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Or Gerlitz
7bc531dd88 IPoIB: Remove a misleading debug print
Commit 732a2170 ("IB/ipoib: Bound the net device to the ipoib_neigh
structue") left a misleading debug print (n->dev would be a bond
device only if boding is used).  Clean it up.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Or Gerlitz
bafff97417 IPoIB: Handle bonding failover race for connected neighbours too
Move up the code that checks for a situation where the remote GID
stored in the ipoib_neigh is different than the one present in the
neighbour (handle gratuitous ARP) or that a bonding fail over has
happened but the neighbour still has a pointer to an ipoib_neigh
created by a different device than the current slave.  This will cause
the driver to apply the check also for connected mode neighbours.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
James Bottomley
d3f46f39b7 [SCSI] remove use_sg_chaining
With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.

Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-30 13:14:02 -06:00
Linus Torvalds
9b73e76f3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
  [SCSI] usbstorage: use last_sector_bug flag universally
  [SCSI] libsas: abstract STP task status into a function
  [SCSI] ultrastor: clean up inline asm warnings
  [SCSI] aic7xxx: fix firmware build
  [SCSI] aacraid: fib context lock for management ioctls
  [SCSI] ch: remove forward declarations
  [SCSI] ch: fix device minor number management bug
  [SCSI] ch: handle class_device_create failure properly
  [SCSI] NCR5380: fix section mismatch
  [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
  [SCSI] IB/iSER: add logical unit reset support
  [SCSI] don't use __GFP_DMA for sense buffers if not required
  [SCSI] use dynamically allocated sense buffer
  [SCSI] scsi.h: add macro for enclosure bit of inquiry data
  [SCSI] sd: add fix for devices with last sector access problems
  [SCSI] fix pcmcia compile problem
  [SCSI] aacraid: add Voodoo Lite class of cards.
  [SCSI] aacraid: add new driver features flags
  [SCSI] qla2xxx: Update version number to 8.02.00-k7.
  [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
  ...
2008-01-25 17:19:08 -08:00
Jan Engelhardt
1cf18d5aab IPoIB: Constify seq_operations function pointer tables
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:46 -08:00
Krishna Kumar
48fe5e594c IPoIB: Remove redundant check of netif_queue_stopped() in xmit handler
qdisc_run() now tests for queue_stopped() before calling
__qdisc_run(), and the same check is done in every iteration of
__qdisc_run(), so another check is not required in the driver xmit.
This means that ipoib_start_xmit() no longer needs to test
netif_queue_stopped(); the test was added to fix earlier kernels,
where the networking stack did not guarantee that the xmit method of
an LLTX driver would not be called after the queue was stopped, but
current kernels do provide this guarantee.

To validate, I put a debug in the TX_BUSY path which never hit with 64
threads running overnight exercising this code a few 100 million
times.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:44 -08:00
Erez Zilber
6410627eb9 IB/iser: Add change_queue_depth method
Add a .change_queue_depth handler to the scsi_host_template in the
iSER driver.  iscsi_change_queue_depth was added to iscsi_tcp in order
to solve the problem of queue depth which was too high for some
targets.  It is also applicable for iSER.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
Erez Zilber
a4ef1451df IB/iser: Print information about unhandled RDMA CM events
Some RDMA CM events are not supported or not handled in iSER.
This patch adds some info (printk) for the user about them.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
David Dillow
7aa54bd730 IB/srp: Add identifying information to log messages
When you have multiple targets, it gets really confusing when you try
to track down who did a reset when there is no identifying information
in the log message, especially when the same extension ID is mapped
through two different local IB ports.  So, add an identifier that can
be used to track back to which local IB port/remote target pair is the
one having problems.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Acked-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:38 -08:00
Pradeep Satyanarayana
586a693448 IPoIB/CM: Enable SRQ support on HCAs that support fewer than 16 SG entries
Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG
entries for SRQs.  Currently IPoIB/CM implicitly assumes all HCAs will
support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This
patch removes that restriction by limiting the maximum MTU in connected
mode to what the maximum number of SRQ SG entries allows.

This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728>

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
David Dillow
fff09a8e6e IB/srp: Enable SG list chaining
By default, the SCSI mid-layer seems to send down 512KB requests
(sg_tablesize = 256), with some requests occasionally combined. By
allowing the mid-layer to chain requests, we can easily grow to 1024KB
or larger -- I've tested 4096KB I/O requests with no problems.

I looked through the DMA paths on the hardware drivers to ensure they
could take advantage of the SG chaining, and it seems that every one
except ipath uses the system's DMA routines, which have been converted
to handle chaining.  ipath looks like it should be OK, but I have no
way to test it.

Signed-off-by: David Dillow <dillowda@ornl.gov>

[ Tested on ipath.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
David Dillow
8cba207732 IB/srp: Respect target credit limit
The current SRP initiator will send requests even if it has no credits
available.  The results of sending extra requests are vendor specific,
but on some devices, overrunning credits will cost 85% of peak
performance -- e.g. 100 MB/s vs 720 MB/s.  Other devices may just drop
the requests.

This patch will tell the SCSI midlayer to queue requests if there are
fewer than two credits remaining, and will not issue a task management
request if there are no credits remaining.  The mid-layer will retry
the queued command once an outstanding command completes.

The patch also removes the unlikely() in __srp_get_tx_iu(), as it is
not at all unlikely to hit this limit under heavy load.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Rolf Manderscheid
a9e527e3f9 IPoIB: improve IPv4/IPv6 to IB mcast mapping functions
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs.  The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised.  This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Oliver Pinter
38dc732f47 IB/iser: Typo fix (s/destory/destroy/)
Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:32 -08:00
Erez Zilber
bd5d7a8585 IB/iser: update URLs of iSER docs
Signed-off-by: Erez Zilber <erezz@voltaire.com>
2008-01-25 14:15:32 -08:00
Joe Perches
908cf9a565 drivers/infiniband: Add missing "space"
Add missing spaces in the middle of format strings.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:26 -08:00
Pradeep Satyanarayana
68e995a295 IPoIB/cm: Add connected mode support for devices without SRQs
Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared
receive queues).  The current IPoIB connected mode support only works
on devices that support SRQs.

Fix this by adding support for using the receive queue of each
connected mode receive QP.  The disadvantage of this compared to using
an SRQ is that it means a full queue of receives must be posted for
each remote connected mode peer, which means that total memory usage
is potentially much higher than when using SRQs.  To manage this, add
a new module parameter "max_nonsrq_conn_qp" that limits the number of
connections allowed per interface.

The rest of the changes are fairly straightforward: we use a table of
struct ipoib_cm_rx to hold all the active connections, and put the
table index of the connection in the high bits of receive WR IDs.
This is needed because we cannot rely on the struct ib_wc.qp field for
non-SRQ receive completions.  Most of the rest of the changes just
test whether or not an SRQ is available, and post receives or find
received packets in the right place depending on the answer.

Cleaning up dead connections actually becomes simpler, because we do
not have to do the "last WQE reached" dance that is required to
destroy QPs attached to an SRQ.  We just move the QP to the error
state and wait for all pending receives to be flushed.

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>

[ Completely rewritten and split up, based on Pradeep's work.  Several
  bugs fixed and no doubt several bugs introduced.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:24 -08:00
Roland Dreier
efcd99717f IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()
Factor out the code for going through the rx_reap list of struct
ipoib_cm_rx and freeing each one.  This consolidates the code
duplicated between ipoib_cm_dev_stop() and ipoib_cm_rx_reap() and
reduces the risk of error when adding additional accounting.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:24 -08:00
Roland Dreier
7b3687df66 IPoIB/cm: Factor out ipoib_cm_create_srq()
Factor out the code to create an SRQ and allocate the receive ring in
ipoib_cm_dev_init() into a new function ipoib_cm_create_srq().  This
will make the code neater when support for devices that don't implement
SRQs is added.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:24 -08:00
Roland Dreier
1efb61444c IPoIB/cm: Factor out ipoib_cm_free_rx_ring()
Factor out the code to unmap/free skbs and free the receive ring in
ipoib_cm_dev_cleanup() into a new function ipoib_cm_free_rx_ring().
This function will be called from a couple of other places when
support for devices that don't implement SRQs is added.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:24 -08:00
Roland Dreier
2337f80941 IPoIB: Trivial formatting cleanups
Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:23 -08:00
Erez Zilber
90c18f3c28 [SCSI] IB/iSER: add logical unit reset support
eh_device_reset_handler was already added to scsi_host_template
in iscsi_tcp, and is now added also for iscsi_iser.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-23 13:39:43 -06:00
Olaf Kirch
a8ac6311cc [SCSI] iscsi: convert xmit path to iscsi chunks
Convert xmit to iscsi chunks.

from michaelc@cs.wisc.edu:

Bug fixes, more digest integration, sg chaining conversion and other
sg wrapper changes, coding style sync up, and removal of io fields,
like pdu_sent, that are not needed.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:42 -06:00
Mike Christie
f6d5180c78 [SCSI] libiscsi: fix nop handling
During root boot and shutdown the target could send us nops.
At this time iscsid cannot be running, so the target will drop
the session and the boot or shutdown will hang.

To handle this and allow us to better control when to check the network
this patch moves the nop handling to the kernel.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:35 -06:00
Mike Christie
b3a7ea8d50 [SCSI] libiscsi: do not block session during logout
There is not need to block the session during logout. Since
we are going to fail the commands that were blocked just fail them
immediately instead.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:28 -06:00
Boaz Harrosh
38ad03de3f [SCSI] libiscsi,iser: patch for AHS support
- The default initialization of hdr_max is the minimum -
    sizeof(struct iscsi_cmd) - Once this patch goes into iser the default
    initialization at libiscsi can be removed.
  - This is not yet full support for AHSs at iser end. But it should be easy.
    Just allocate more space at iser_desc right after iscsi_hdr. Than
    at transmission time use ctask->hdr_len to retrieve the total
    size of all iscsi pdu headers. See previous patch at iscsi_tcp.[ch]

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:25 -06:00
Mike Christie
843c0a8a76 [SCSI] libiscsi, iscsi_tcp: add device support
This patch adds logical unit reset support. This should work for ib_iser,
but I have not finished testing that driver so it is not hooked in yet.

This patch also temporarily reverts the iscsi_tcp r2t write out patch.
That code is completely rewritten in this patchset.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:19 -06:00
Dave Dillow
ad696989b4 IB/srp: Release transport before removing host
The documented call sequence for removing a host is to call the
transport xxx_remove_host() prior to scsi_remove_host(). The SRP
transport used to crash when that order was followed, but as it is now
fixed, use the documented order.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-08 12:08:10 -08:00
David Dillow
b0e47c8b79 IB/srp: Fix list corruption/oops on module reload
Add a missing call to srp_remove_host() in srp_remove_one() so that we 
don't leak SRP transport class list entries.

Tested-by: David Dillow <dillowda@ornl.gov>
Acked-by: FUJITA Tomonori <tomof@acm.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-03 10:25:27 -08:00
Jack Morgenstein
1401b53acc IPoIB: Fix oops if xmit is called when priv->broadcast is NULL
If a port goes down, ipoib_ib_dev_down() is invoked -- which flushes
the mcasts (clearing priv->broadcast) and clearing the path record
cache.  If ipoib_start_xmit() is then invoked (before the broadcast
group is rejoined), a kernel oops results from attempting to access
priv->broadcast, which is still unset.

Returning NULL from path_rec_create() if priv->broadcast is NULL is a
harmless way of bypassing the problem -- the offending packet is
simply discarded "without prejudice."

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-27 15:40:10 -08:00
Erez Zilber
a316b79c33 IB/iser: Add missing counter increment in iser_data_buf_aligned_len()
While adding sg chaining support to iSER, a "for" loop was replaced
with a "for_each_sg" loop. The "for" loop included the incrementation
of 2 variables. Only one of them is incremented in the current
"for_each_sg" loop. This caused iSER to think that all data is
unaligned, and all data was copied to aligned buffers.

This patch increments the missing counter inside the "for_each_sg"
loop whenever necessary.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-24 13:50:39 -08:00
Roland Dreier
09f60f8f54 IPoIB/cm: Fix receive QP cleanup
Commit 1b524963 ("IPoIB/cm: Use common CQ for CM send completions")
changed how the high-order bits of work request IDs were used, which
had the effect that IPOIB_CM_RX_DRAIN_WRID was no longer handled as a
connected mode receive completion.  This leads to the messages

    ib1: cm send completion event with wrid 1073741823 (> 64)
    ib1: RX drain timing out

when an interface with connected mode QPs is brought down.  Fix this
by making sure that both IPOIB_OP_CM and IPOIB_OP_RECV are set in
IPOIB_CM_RX_DRAIN_WRID.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-26 13:44:25 -07:00
Linus Torvalds
0b776eb542 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4_core: Increase command timeout for INIT_HCA to 10 seconds
  IPoIB/cm: Use common CQ for CM send completions
  IB/uverbs: Fix checking of userspace object ownership
  IB/mlx4: Sanity check userspace send queue sizes
  IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
  IB/ehca: Enable large page MRs by default
  IB/ehca: Change meaning of hca_cap_mr_pgsize
  IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr()
  IB/ehca: Fix masking error in {,re}reg_phys_mr()
  IB/ehca: Supply QP token for SRQ base QPs
  IPoIB: Use round_jiffies() for ah_reap_task
  RDMA/cma: Fix deadlock destroying listen requests
  RDMA/cma: Add locking around QP accesses
  IB/mthca: Avoid alignment traps when writing doorbells
  mlx4_core: Kill mlx4_write64_raw()
2007-10-23 09:56:11 -07:00
Jens Axboe
45711f1af6 [SG] Update drivers to use sg helpers
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22 21:19:53 +02:00
Michael S. Tsirkin
1b524963fd IPoIB/cm: Use common CQ for CM send completions
Use the same CQ for CM send completions as for all other IPoIB
completions.  This means all completions are processed via the same
NAPI polling routine.  This should help reduce the number of
interrupts for bi-directional traffic (such as TCP) and fixes "driver
is hogging interrupts" errors reported for IPoIB send side, e.g.
<https://bugs.openfabrics.org/show_bug.cgi?id=508>

To do this, keep a per-interface counter of outstanding send WRs, and
stop the interface when this counter reaches the send queue size to
avoid CQ overruns.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-19 21:39:34 -07:00
Roland Dreier
fd312561ad IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
It's too hard to figure out what "!likely(...)" really means, and who
knows how compilers interpret the hint.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-17 21:54:44 -07:00
Anton Blanchard
69fc507a14 IPoIB: Use round_jiffies() for ah_reap_task
Use round_jiffies() to align the 1 second ah_reap_task with other work
and potentially save power by sleeping cores for longer.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-16 12:28:56 -07:00
Jens Axboe
53d412fce0 infiniband: sg chaining support
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:20:59 +02:00
Moni Shoua
200d1713b4 IB/ipoib: Verify address handle validity on send
When the bonding device senses a carrier loss of its active slave it replaces
that slave with a new one. In between the times when the carrier of an IPoIB
device goes down and ipoib_neigh is destroyed, it is possible that the
bonding driver will send a packet on a new slave that uses an old ipoib_neigh.
This patch detects and prevents this from happenning.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-15 14:20:45 -04:00
Moni Shoua
732a2170f4 IB/ipoib: Bound the net device to the ipoib_neigh structue
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
call.

When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb->dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.

Under this scheme, ipoib_neigh_destructor assumption that for each struct
neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
can be casted to struct ipoib_dev_priv is buggy.

To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one, when n->dev->flags has the
IFF_MASTER bit set.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-15 14:20:45 -04:00
Linus Torvalds
df3d80f5a5 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (207 commits)
  [SCSI] gdth: fix CONFIG_ISA build failure
  [SCSI] esp_scsi: remove __dev{init,exit}
  [SCSI] gdth: !use_sg cleanup and use of scsi accessors
  [SCSI] gdth: Move members from SCp to gdth_cmndinfo, stage 2
  [SCSI] gdth: Setup proper per-command private data
  [SCSI] gdth: Remove gdth_ctr_tab[]
  [SCSI] gdth: switch to modern scsi host registration
  [SCSI] gdth: gdth_interrupt() gdth_get_status() & gdth_wait() fixes
  [SCSI] gdth: clean up host private data
  [SCSI] gdth: Remove virt hosts
  [SCSI] gdth: Reorder scsi_host_template intitializers
  [SCSI] gdth: kill gdth_{read,write}[bwl] wrappers
  [SCSI] gdth: Remove 2.4.x support, in-kernel changelog
  [SCSI] gdth: split out pci probing
  [SCSI] gdth: split out eisa probing
  [SCSI] gdth: split out isa probing
  gdth: Make one abuse of scsi_cmnd less obvious
  [SCSI] NCR5380: Use scsi_eh API for REQUEST_SENSE invocation
  [SCSI] usb storage: use scsi_eh API in REQUEST_SENSE execution
  [SCSI] scsi_error: Refactoring scsi_error to facilitate in synchronous REQUEST_SENSE
  ...
2007-10-15 08:19:33 -07:00
FUJITA Tomonori
aebd5e476e [SCSI] transport_srp: add rport roles attribute
This adds a 'roles' attribute to rport like transport_fc. The role can
be initiator or target. That is, the initiator driver creates target
remote ports and the target driver creates initiator remote ports.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-10-12 14:37:46 -04:00
FUJITA Tomonori
3236822b1c [SCSI] ib_srp: convert to use the srp transport class
This converts ib_srp to use the srp transport class.

I don't have ib hardware so I've not tested this patch.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-10-12 14:37:42 -04:00
Linus Torvalds
ce9d3c9a6a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits)
  mlx4_core: Fix section mismatches
  IPoIB: Allow setting policy to ignore multicast groups
  IB/mthca: Mark error paths as unlikely() in post_srq_recv functions
  IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages.
  IB/ipath: Remove redundant link state checks
  IB/ipath: Fix IB_EVENT_PORT_ERR event
  IB/ipath: Better handling of unexpected GPIO interrupts
  IB/ipath: Maintain active time on all chips
  IB/ipath: Fix QHT7040 serial number check
  IB/ipath: Indicate a couple of chip bugs to userspace
  IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround
  IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close
  IB/ipath: Remove duplicate copy of LMC
  IB/ipath: Add ability to set the LMC via the sysfs debugging interface
  IB/ipath: Optimize completion queue entry insertion and polling
  IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED
  IB/ipath: Generate flush CQE when QP is in error state
  IB/ipath: Remove redundant code
  IB/ipath: Future proof eeprom checksum code (contents reading)
  IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate
  ...
2007-10-11 19:43:13 -07:00
Roland Dreier
9153f66a5b IPoIB: Fix unused variable warning
The conversion to use netdevice internal stats left an unused variable
in ipoib_neigh_free(), since there's no longer any reason to get
netdev_priv() in order to increment dropped packets.  Delete the
unused priv variable.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-10 16:55:30 -07:00
Roland Dreier
de90351219 [IPoIB]: Convert to netdevice internal stats
Use the stats member of struct netdevice in IPoIB, so we can save
memory by deleting the stats member of struct ipoib_dev_priv, and save
code by deleting ipoib_get_stats().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:41 -07:00
Stephen Hemminger
3b04ddde02 [NET]: Move hardware header operations out of netdevice.
Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:52 -07:00
Ralf Baechle
10d024c1b2 [NET]: Nuke SET_MODULE_OWNER macro.
It's been a useless no-op for long enough in 2.6 so I figured it's time to
remove it.  The number of people that could object because they're
maintaining unified 2.4 and 2.6 drivers is probably rather small.

[ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:13 -07:00
Stephen Hemminger
bea3348eef [NET]: Make NAPI polling independent of struct net_device objects.
Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.

In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.

The signature of the ->poll() call back goes from:

	int foo_poll(struct net_device *dev, int *budget)

to

	int foo_poll(struct napi_struct *napi, int budget)

The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract).  The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.

The napi_struct is to be embedded in the device driver private data
structures.

Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler.  Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.

With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.

Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.

[ Ported to current tree and all drivers converted.  Integrated
  Stephen's follow-on kerneldoc additions, and restored poll_list
  handling to the old style to fix mutual exclusion issues.  -DaveM ]

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:45 -07:00
Or Gerlitz
335a64a5a9 IPoIB: Allow setting policy to ignore multicast groups
The kernel IB stack allows (through the RDMA CM) userspace
applications to join and use multicast groups from the IPoIB MGID
range.  This allows multicast traffic to be handled directly from
userspace QPs, without going through the kernel stack, which gives
better performance for some applications.

However, to fully interoperate with IP multicast, such userspace
applications need to participate in IGMP reports and queries, or else
routers may not forward the multicast traffic to the system where the
application is running.  The simplest way to do this is to share the
kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to
join multicast groups that are being handled directly in userspace.

However, in such cases, the actual multicast traffic should not also
be handled by the IPoIB interface, because that would burn resources
handling multicast packets that will just be discarded in the kernel.

To handle this, this patch adds lookup on the database used for IB
multicast group reference counting when IPoIB is joining multicast
groups, and if a multicast group is already handled by user space,
then the IPoIB kernel driver ignores the group.  This is controlled by
a per-interface policy flag.  When the flag is set, IPoIB will not
join and attach its QP to a multicast group which already has an entry
in the database; when the flag is cleared, IPoIB will behave as before
this change.

For each IPoIB interface, the /sys/class/net/$intf/umcast attribute
controls the policy flag.  The default value is off/0.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-10 13:02:30 -07:00
Dotan Barak
ede6bc04f3 IPoIB/cm: Clean up initialization of QP attr in ipoib_cm_create_tx_qp()
Make the way QP is being created in ipoib_cm_create_tx_qp()
consistent with ipoib_cm_create_rx_qp().

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:18 -07:00
Roland Dreier
ec2a1344ad IB/iser: Remove unnecessary includes
<asm/scatterlist.h> is not needed because everyplace it appears,
<linux/scatterlist.h> also appears.  <asm/io.h> is not needed because
nothing seems to be using device IO anyway.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:13 -07:00
Sean Hefty
247e020ee5 IB/srp: Add QoS support through service ID
Provide the target service ID when performing a path record query to
support optional QoS capability.  QoS requires support from the SA.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:12 -07:00
Sean Hefty
81668838c4 IPoIB: Specify Traffic Class with path record queries for QoS support
To support QoS within and between subnets, modify IPoIB to request
specific Traffic Class values with path record queries, using
the value associated with the IPoIB broadcast group.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ See some comments I made on this at v1 and v2 of the posts
  <http://lists.openfabrics.org/pipermail/general/2007-August/039275.html>
  <http://lists.openfabrics.org/pipermail/general/2007-September/040312.html> ]

Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:11 -07:00
Eli Cohen
ca6de177ac IPoIB: Fix error path memory leak
Clean up properly if ib_query_pkey() or ib_query_gid() fail.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:06 -07:00
Eli Cohen
b3ac60fc24 IPoIB: Fix typo to end statement with ';' instead of ','
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:06 -07:00