To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.
Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pages that were mapped using ib_dma_map_page() should be unmapped
using ib_dma_unmap_page().
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second. Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use new function ib_rate_to_mbps() to handle printing rate in debugfs,
so that we handle extended rates.
Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and driver's host attrs to use the attribute
container sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and driver's session attrs to use the attribute
container sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and drivers to use the attribute container
sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC3270 mandates that iSCSI PDUs are padded to the closest integer
number of four byte words. Fix the iser code to support that on both
the TX/RX flows.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The code that prepares the SG associated with SCSI command for FMR was
buggy for systems with DMA addresses that don't fit in unsigned long,
e.g under the 32-bit based XenServer dom0 sizeof(dma_addr_t) is 8.
Fix that by casting to unsigned long long a masking constant used by
the code. This resolves a crash in iser_sg_to_page_vec on this system.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Convert to DIV_ROUND_UP_SECTOR_T usage for sectors / dev_max_sectors
kernel.h: Add DIV_ROUND_UP_ULL and DIV_ROUND_UP_SECTOR_T macro usage
iscsi-target: Add iSCSI fabric support for target v4.1
iscsi: Add Serial Number Arithmetic LT and GT into iscsi_proto.h
iscsi: Use struct scsi_lun in iscsi structs instead of u8[8]
iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch renames the following iscsi_proto.h structures to avoid
namespace issues with drivers/target/iscsi/iscsi_target_core.h:
*) struct iscsi_cmd -> struct iscsi_scsi_req
*) struct iscsi_cmd_rsp -> struct iscsi_scsi_rsp
*) struct iscsi_login -> struct iscsi_login_req
This patch includes useful ISCSI_FLAG_LOGIN_[CURRENT,NEXT]_STAGE*,
and ISCSI_FLAG_SNACK_TYPE_* definitions used by iscsi_target_mod, and
fixes the incorrect definition of struct iscsi_snack to following
RFC-3720 Section 10.16. SNACK Request.
Also, this patch updates libiscsi, iSER, be2iscsi, and bn2xi to
use the updated structure definitions in a handful of locations.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
IB/qib: Defer HCA error events to tasklet
mlx4_core: Bump the driver version to 1.0
RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit()
IB/mlx4: Support PMA counters for IBoE
IB/mlx4: Use flow counters on IBoE ports
IB/pma: Add include file for IBA performance counters definitions
mlx4_core: Add network flow counters
mlx4_core: Fix location of counter index in QP context struct
mlx4_core: Read extended capabilities into the flags field
mlx4_core: Extend capability flags to 64 bits
IB/mlx4: Generate GID change events in IBoE code
IB/core: Add GID change event
RDMA/cma: Don't allow IPoIB port space for IBoE
RDMA: Allow for NULL .modify_device() and .modify_port() methods
IB/qib: Update active link width
IB/qib: Fix potential deadlock with link down interrupt
IB/qib: Add sysfs interface to read free contexts
IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP
IB/qib: Remove double define
IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP
...
SCSI scanning of a channel🆔lun triplet in Linux works as follows
(function scsi_scan_target() in drivers/scsi/scsi_scan.c):
- If lun == SCAN_WILD_CARD, send a REPORT LUNS command to the target
and process the result.
- If lun != SCAN_WILD_CARD, send an INQUIRY command to the LUN
corresponding to the specified channel🆔lun triplet to verify
whether the LUN exists.
So a SCSI driver must either take the channel and target id values in
account in its quecommand() function or it should declare that it only
supports one channel and one target id.
Currently the ib_srp driver does neither. As a result scanning the
SCSI bus via e.g. rescan-scsi-bus.sh causes many duplicate SCSI
devices to be created. For each 0:0:L device, several duplicates are
created with the same LUN number and with (C:I) != (0:0). Fix this by
declaring that the ib_srp driver only supports one channel and one
target id.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@kernel.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
RDMA/cma: Save PID of ID's owner
RDMA/cma: Add support for netlink statistics export
RDMA/cma: Pass QP type into rdma_create_id()
RDMA: Update exported headers list
RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
RDMA/nes: Add a check for strict_strtoul()
RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
RDMA/cxgb4: Use completion objects for event blocking
IB/srp: Fix integer -> pointer cast warnings
IB: Add devnode methods to cm_class and umad_class
IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
IB/uverbs: Add devnode method to set path/mode
RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
RDMA: Add netlink infrastructure
RDMA: Add error handling to ib_core_init()
The RDMA CM currently infers the QP type from the port space selected
by the user. In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type. For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.
Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix
drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_handle_recv':
drivers/infiniband/ulp/srp/ib_srp.c:1150: warning: cast to pointer from integer of different size
drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_send_completion':
drivers/infiniband/ulp/srp/ib_srp.c🔢 warning: cast to pointer from integer of different size
by adding an intermediate cast to uintptr_t.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: David Dillow <dillowda@ornl.gov>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB: Increase DMA max_segment_size on Mellanox hardware
IB/mad: Improve an error message so error code is included
RDMA/nes: Don't print success message at level KERN_ERR
RDMA/addr: Fix return of uninitialized ret value
IB/srp: try to use larger FMR sizes to cover our mappings
IB/srp: add support for indirect tables that don't fit in SRP_CMD
IB/srp: rework mapping engine to use multiple FMR entries
IB/srp: allow sg_tablesize to be set for each target
IB/srp: move IB CM setup completion into its own function
IB/srp: always avoid non-zero offsets into an FMR
Now that we can get larger SG lists, we can take advantage of HCAs that
allow us to use larger FMR sizes. In many cases, we can use up to 512
entries, so start there and work our way down.
Signed-off-by: David Dillow <dillowda@ornl.gov>
This allows us to guarantee the ability to submit up to 8 MB requests
based on the current value of SCSI_MAX_SG_CHAIN_SEGMENTS. While FMR will
usually condense the requests into 8 SG entries, it is imperative that
the target support external tables in case the FMR mapping fails or is
not supported.
We add a safety valve to allow targets without the needed support to
reap the benefits of the large tables, but fail in a manner that lets
the user know that the data didn't make it to the device. The user must
add "allow_ext_sg=1" to the target parameters to indicate that the
target has the needed support.
If indirect_sg_entries is not specified in the modules options, then
the sg_tablesize for the target will default to cmd_sg_entries unless
overridden by the target options.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Instead of forcing all of the S/G entries to fit in one FMR, and falling
back to indirect descriptors if that fails, allow the use of as many
FMRs as needed to map the request. This lays the groundwork for allowing
indirect descriptor tables that are larger than can fit in the command
IU, but should marginally improve performance now by reducing the number
of indirect descriptors needed.
We increase the minimum page size for the FMR pool to 4K, as larger
pages help increase the coverage of each FMR, and it is rare that the
kernel would send down a request with scattered 512 byte fragments.
This patch also move some of the target initialization code afte the
parsing of options, to keep it together with the new code that needs to
allocate memory based on the options given.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Different configurations of target software allow differing max sizes of
the command IU. Allowing this to be changed per-target allows all
targets on an initiator to get an optimal setting.
We deprecate srp_sg_tablesize and replace it with cmd_sg_entries in
preparation for allowing more indirect descriptors than can fit in the
IU.
Signed-off-by: David Dillow <dillowda@ornl.gov>
It is unclear exactly how this code works around Mellanox SRP targets,
or if the problem is on the target side or in the HCA itself. In an
abundance of caution, we should always enable the workaround.
Signed-off-by: David Dillow <dillowda@ornl.gov>
This pactch has iser export the address and port
of the endpoint.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.
This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel. A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).
Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ib_wq is added, which is used as the common workqueue for infiniband
instead of the system workqueue. All system workqueue usages
including flush_scheduled_work() callers are converted to use and
flush ib_wq.
* cancel_delayed_work() + flush_scheduled_work() converted to
cancel_delayed_work_sync().
* qib_wq is removed and ib_wq is used instead.
This is to prepare for deprecation of flush_scheduled_work().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Merge the two tests in srp_queuecommand() of whether information unit
allocation succeeded into one. An intended side effect of this change
is that we fix the warning:
drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_queuecommand':
drivers/infiniband/ulp/srp/ib_srp.c:1116: warning: 'req' may be used uninitialized in this function
(seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y at least with gcc 4.4.4)
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
As a first step in moving from LRO to GRO, revert commit af40da894e
("IPoIB: add LRO support"). Also eliminate the ethtool set_flags
callback which isn't needed anymore. Finally, we need to include
<linux/sched.h> directly to get the declaration of restart_syscall()
(which used to be included implicitly through <linux/inet_lro.h>).
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Put the variables accessed together in the hot-path into common
cachelines, and separate them by RW vs RO to avoid false dirtying.
We keep a local copy of the lkey and rkey in the target to avoid
traversing pointers (and associated cache lines) to find them.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
We don't need protection against the SCSI stack, so use our own lock to
allow parallel progress on separate CPUs.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
We only need the lock to cover list and credit manipulations, so push
those into srp_remove_req() and update the call chains.
We reorder the request removal and command completion in
srp_process_rsp() to avoid the SCSI mid-layer sending another command
before we've released our request and added any credits returned by the
target. This prevents us from returning HOST_BUSY unneccesarily.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out, small cleanups, and modified to avoid potential extraneous
HOST_BUSY returns by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
We only need locks to protect our lists and number of credits available.
By pre-consuming the credit for the request, we can reduce our lock
coverage to just those areas. If we don't actually send the request,
we'll need to put the credit back into the pool.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
We use req->scmnd != NULL to indicate an active request, so there's no
need to keep a separate list for them. We can afford the array iteration
during error handling, and dropping it gives us one less item that needs
lock protection.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out and small cleanups by David Dillow ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
Only one CPU at a time will own an RX IU, so using the address of the IU
as the work request cookie allows us to avoid taking a lock. We can
similarly prepare the TX path for lockless posting by moving the free TX
IUs to a list. This also removes the requirement that the queue sizes be
a power of 2.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ broken out, small cleanups, and modified to avoid needing an extra field
in the IU by David Dillow]
Signed-off-by: David Dillow <dillowda@ornl.gov>
We can only have one task management comment outstanding, so move the
completion and status to the target port. This allows us to handle
resets of a LUN without a corresponding request having been sent.
Meanwhile, we don't need to play games with host_scribble, just use it
as the pointer it is.
This fixes a crash when we issue a bus reset using sg_reset.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=13893
Reported-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Move the mid-layer's ->queuecommand() invocation from being locked
with the host lock to being unlocked to facilitate speeding up the
critical path for drivers who don't need this lock taken anyway.
The patch below presents a simple SCSI host lock push-down as an
equivalent transformation. No locking or other behavior should change
with this patch. All existing bugs and locking orders are preserved.
Additionally, add one parameter to queuecommand,
struct Scsi_Host *
and remove one parameter from queuecommand,
void (*done)(struct scsi_cmnd *)
Scsi_Host* is a convenient pointer that most host drivers need anyway,
and 'done' is redundant to struct scsi_cmnd->scsi_done.
Minimal code disturbance was attempted with this change. Most drivers
needed only two one-line modifications for their host lock push-down.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (63 commits)
IB/qib: clean up properly if pci_set_consistent_dma_mask() fails
IB/qib: Allow driver to load if PCIe AER fails
IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not set
IB/qib: Fix extra log level in qib_early_err()
RDMA/cxgb4: Remove unnecessary KERN_<level> use
RDMA/cxgb3: Remove unnecessary KERN_<level> use
IB/core: Add link layer type information to sysfs
IB/mlx4: Add VLAN support for IBoE
IB/core: Add VLAN support for IBoE
IB/mlx4: Add support for IBoE
mlx4_en: Change multicast promiscuous mode to support IBoE
mlx4_core: Update data structures and constants for IBoE
mlx4_core: Allow protocol drivers to find corresponding interfaces
IB/uverbs: Return link layer type to userspace for query port operation
IB/srp: Sync buffer before posting send
IB/srp: Use list_first_entry()
IB/srp: Reduce number of BUSY conditions
IB/srp: Eliminate two forward declarations
IB/mlx4: Signal node desc changes to SM by using FW to generate trap 144
IB: Replace EXTRA_CFLAGS with ccflags-y
...
Use the new {max,min}3 macros to save some cycles and bytes on the stack.
This patch substitutes trivial nested macros with their counterpart.
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
srp_send_tsk_mgmt() was missing the proper DMA sync calls before posting
the buffer to the device.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use the list_first_entry() macro in ib_srp instead of open-coding the equivalent,
which makes the source code slightly more descriptive. The list_first_entry()
macro itself was introduced in kernel 2.6.22.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
As proposed by the SRP (draft) standard, ib_srp reserves one ring
element for SRP_TSK_MGMT requests. This patch makes sure that the SCSI
mid-layer never tries to queue more than (SRP request limit) - 1 SCSI
commands to ib_srp. This improves performance for targets whose request
limit is less than or equal to SRP_NORMAL_REQ_SQ_SIZE by reducing the
number of BUSY responses reported by ib_srp to the SCSI mid-layer.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Use the net device's dev_id field to encode the port number of the pci
device. This can be used to to associate a net device with the pci
device's port. The encoding is: dev_id = port - 1.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch adds support for SRP_CRED_REQ to avoid a lockup by targets
that use that mechanism to return credits to the initiator. This
prevents a lockup observed in the field where we would never add the
credits from the SRP_CRED_REQ to our current count, and would therefore
never send another command to the target.
Minimal support for SRP_AER_REQ is also added, as these messages can
also be used to convey additional credits to the initiator.
Based upon extensive debugging and code by Bart Van Assche and a bug
report by Chris Worley.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The transmit ring in ib_srp (srp_target.tx_ring) is currently only used
for allocating requests sent by the initiator to the target. This patch
prepares using that ring for allocation of both requests and responses.
Also, this patch differentiates the uses of SRP_SQ_SIZE, increases the
size of the IB send completion queue by one element and reserves one
transmit ring slot for SRP_TSK_MGMT requests.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The patch below updates broken web addresses in the kernel
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Ben Pfaff <blp@cs.stanford.edu>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
IPoIB is IP-over-Infiniband link layer. In the case of IBoE, the link
layer is Ethernet and IP can work directly over Ethernet, so disable
IPoIB for non-IB_LINK_LAYER_INFINIBAND ports.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for multicast traffic. All incoming
packets are set to PACKET_HOST which means that igmp_recv() will
ignore the IGMP broadcasts/multicasts.
This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily. In a large
private network this can cause traffic spikes.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (42 commits)
IB/qib: Add missing <linux/slab.h> include
IB/ehca: Drop unnecessary NULL test
RDMA/nes: Fix confusing if statement indentation
IB/ehca: Init irq tasklet before irq can happen
RDMA/nes: Fix misindented code
RDMA/nes: Fix showing wqm_quanta
RDMA/nes: Get rid of "set but not used" variables
RDMA/nes: Read firmware version from correct place
IB/srp: Export req_lim via sysfs
IB/srp: Make receive buffer handling more robust
IB/srp: Use print_hex_dump()
IB: Rename RAW_ETY to RAW_ETHERTYPE
RDMA/nes: Fix two sparse warnings
RDMA/cxgb3: Make needlessly global iwch_l2t_send() static
IB/iser: Make needlessly global iser_alloc_rx_descriptors() static
RDMA/cxgb4: Add timeouts when waiting for FW responses
IB/qib: Fix race between qib_error_qp() and receive packet processing
IB/qib: Limit the number of packets processed per interrupt
IB/qib: Allow writes to the diag_counters to be able to clear them
IB/qib: Set cfgctxts to number of CPUs by default
...
Export req_lim via sysfs for debugging.
Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Acked-by: David Dillow <dave@thedillows.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
phy/marvell: add 88ec048 support
igb: Program MDICNFG register prior to PHY init
e1000e: correct MAC-PHY interconnect register offset for 82579
hso: Add new product ID
can: Add driver for esd CAN-USB/2 device
l2tp: fix export of header file for userspace
can-raw: Fix skb_orphan_try handling
Revert "net: remove zap_completion_queue"
net: cleanup inclusion
phy/marvell: add 88e1121 interface mode support
u32: negative offset fix
net: Fix a typo from "dev" to "ndev"
igb: Use irq_synchronize per vector when using MSI-X
ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
e1000e: Fix irq_synchronize in MSI-X case
e1000e: register pm_qos request on hardware activation
ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
net: Add getsockopt support for TCP thin-streams
cxgb4: update driver version
cxgb4: add new PCI IDs
...
Manually fix up conflicts in:
- drivers/net/e1000e/netdev.c: due to pm_qos registration
infrastructure changes
- drivers/net/phy/marvell.c: conflict between adding 88ec048 support
and cleaning up the IDs
- drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
conflict (registration change vs marking it static)
The current strategy in ib_srp for posting receive buffers is:
* Post one buffer after channel establishment.
* Post one buffer before sending an SRP_CMD or SRP_TSK_MGMT to the target.
As a result, only the first non-SRP_RSP information unit from the
target will be processed. If that first information unit is an
SRP_T_LOGOUT, it will be processed. On the other hand, if the
initiator receives an SRP_CRED_REQ or SRP_AER_REQ before it receives a
SRP_T_LOGOUT, the SRP_T_LOGOUT won't be processed.
We can fix this inconsistency by changing the strategy for posting
receive buffers to:
* Post all receive buffers after channel establishment.
* After a receive buffer has been consumed and processed, post it again.
A side effect is that the ib_post_recv() call is moved out of the SCSI
command processing path. Since __srp_post_recv() is not called
directly any more, get rid of it and move the code directly into
srp_post_recv(). Also, move srp_post_recv() up in the file to avoid a
forward declaration.
Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Acked-by: David Dillow <dave@thedillows.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Replace an open-coded dump of the receive buffer with a call to
print_hex_dump().
Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Sumeet Lahorani <sumeet.lahorani@oracle.com> reported that the IPoIB
child entries are world-writable; however we don't want ordinary users
to be able to create and destroy child interfaces, so fix them to be
writable only by root.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Following commit 1437ce3983 "ethtool:
Change ethtool_op_set_flags to validate flags", ethtool_op_set_flags
takes a third parameter and cannot be used directly as an
implementation of ethtool_ops::set_flags.
Changes nes and ipoib driver to pass in the appropriate value.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1674 commits)
qlcnic: adding co maintainer
ixgbe: add support for active DA cables
ixgbe: dcb, do not tag tc_prio_control frames
ixgbe: fix ixgbe_tx_is_paused logic
ixgbe: always enable vlan strip/insert when DCB is enabled
ixgbe: remove some redundant code in setting FCoE FIP filter
ixgbe: fix wrong offset to fc_frame_header in ixgbe_fcoe_ddp
ixgbe: fix header len when unsplit packet overflows to data buffer
ipv6: Never schedule DAD timer on dead address
ipv6: Use POSTDAD state
ipv6: Use state_lock to protect ifa state
ipv6: Replace inet6_ifaddr->dead with state
cxgb4: notify upper drivers if the device is already up when they load
cxgb4: keep interrupts available when the ports are brought down
cxgb4: fix initial addition of MAC address
cnic: Return SPQ credit to bnx2x after ring setup and shutdown.
cnic: Convert cnic_local_flags to atomic ops.
can: Fix SJA1000 command register writes on SMP systems
bridge: fix build for CONFIG_SYSFS disabled
ARCNET: Limit com20020 PCI ID matches for SOHARD cards
...
Fix up various conflicts with pcmcia tree drivers/net/
{pcmcia/3c589_cs.c, wireless/orinoco/orinoco_cs.c and
wireless/orinoco/spectrum_cs.c} and feature removal
(Documentation/feature-removal-schedule.txt).
Also fix a non-content conflict due to pm_qos_requirement getting
renamed in the PM tree (now pm_qos_request) in net/mac80211/scan.c
We shouldn't free things here because we free them later.
The call tree looks like this:
iser_connect() ==> initiating the connection establishment
and later
iser_cma_handler() => iser_route_handler() => iser_create_ib_conn_res()
if we fail here, eventually iser_conn_release() is called, resulting
in a double free.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The iser connection teardown flow isn't over until the underlying
Connection Manager (e.g the IB CM) delivers a disconnected or timeout
event through the RDMA-CM. When the remote (target) side isn't
reachable, e.g when some HW e.g port/hca/switch isn't functioning or
taken down administratively, the CM timeout flow is used and the event
may be generated only after relatively long time -- on the order of
tens of seconds.
The current iser code exposes this possibly long delay to higher
layers, specifically to the iscsid daemon and iscsi kernel stack. As a
result, the iscsi stack doesn't respond well: this low-level CM delay
is added to the fail-over time under HA schemes such as the one
provided by DM multipath through the multipathd(8) service.
This patch enhances the reference counting scheme on iser's IB
connections so that the disconnect flow initiated by iscsid from user
space (ep_disconnect) doesn't wait for the CM to deliver the
disconnect/timeout event. (The connection teardown isn't done from
iser's view point until the event is delivered)
The iser ib (rdma) connection object is destroyed when its reference
count reaches zero. When this happens on the RDMA-CM callback
context, extra care is taken so that the RDMA-CM does the actual
destroying of the associated ID, since doing it in the callback is
prohibited.
The reference count of iser ib connection normally reaches three,
where the <ref, deref> relations are
1. conn <init, terminate>
2. conn <bind, stop/destroy>
3. cma id <create, disconnect/error/timeout callbacks>
With this patch, multipath fail-over time is about 30 seconds, while
without this patch, multipath fail-over time is about 130 seconds.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The iscsi connection object life cycle includes binding and unbinding
(conn_stop) to/from the iscsi transport connection object. Since
iscsi connection objects are recycled, at the time the transport
connection (e.g iser's IB connection) is released, it is not valid to
touch the iscsi connection tied to the transport back-pointer since it
may already point to a different transport connection.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add handler to handle events such as port up and down. This is useful
when testing high-availability schemes such as multi-pathing.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Converts the list and the core manipulating with it to be the same as uc_list.
+uses two functions for adding/removing mc address (normal and "global"
variant) instead of a function parameter.
+removes dev_mcast.c completely.
+exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
manipulation with lists on a sandbox (used in bonding and 80211 drivers)
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Finally this bit can be removed. Currently, after the bonding driver is
changed/fixed (32a806c194 net-next-2.6),
that's not possible for an addr with different length than dev->addr_len
to be present in list. Removing this check as in new mc_list there will be
no addrlen in the record.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (69 commits)
[SCSI] scsi_transport_fc: Fix synchronization issue while deleting vport
[SCSI] bfa: Update the driver version to 2.1.2.1.
[SCSI] bfa: Remove unused header files and did some cleanup.
[SCSI] bfa: Handle SCSI IO underrun case.
[SCSI] bfa: FCS and include file changes.
[SCSI] bfa: Modified the portstats get/clear logic
[SCSI] bfa: Replace bfa_get_attr() with specific APIs
[SCSI] bfa: New portlog entries for events (FIP/FLOGI/FDISC/LOGO).
[SCSI] bfa: Rename pport to fcport in BFA FCS.
[SCSI] bfa: IOC fixes, check for IOC down condition.
[SCSI] bfa: In MSIX mode, ignore spurious RME interrupts when FCoE ports are in FW mismatch state.
[SCSI] bfa: Fix Command Queue (CPE) full condition check and ack CPE interrupt.
[SCSI] bfa: IOC recovery fix in fcmode.
[SCSI] bfa: AEN and byte alignment fixes.
[SCSI] bfa: Introduce a link notification state machine.
[SCSI] bfa: Added firmware save clear feature for BFA driver.
[SCSI] bfa: FCS authentication related changes.
[SCSI] bfa: PCI VPD, FIP and include file changes.
[SCSI] bfa: Fix to copy fpma MAC when requested by user space application.
[SCSI] bfa: RPORT state machine: direct attach mode fix.
...
Print the return code of ib_post_send() if it fails to make these
debugging messages more useful.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The IPoIB UD QP reports send completions to priv->send_cq, which is
usually left unarmed; it only gets armed when the number of
outstanding send requests reaches the size of the TX queue. This
arming is done only in the send path for the UD QP. However, when
sending CM packets, the net queue may be stopped for the same reasons
but no measures are taken to recover the UD path from a lockup.
Consider this scenario: a host sends high rate of both CM and UD
packets, with a TX queue length of N. If at some time the number of
outstanding UD packets is more than N/2 and the overall outstanding
packets is N-1, and CM sends a packet (making the number of
outstanding sends equal N), the TX queue will be stopped. When all
the CM packets complete, the number of outstanding packets will still
be higher than N/2 so the TX queue will not be restarted.
Fix this by calling ib_req_notify_cq() when the queue is stopped in
the CM path.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (48 commits)
IB/srp: Clean up error path in srp_create_target_ib()
IB/srp: Split send and recieve CQs to reduce number of interrupts
RDMA/nes: Add support for KR device id 0x0110
IB/uverbs: Use anon_inodes instead of private infinibandeventfs
IB/core: Fix and clean up ib_ud_header_init()
RDMA/cxgb3: Mark RDMA device with CXIO_ERROR_FATAL when removing
RDMA/cxgb3: Don't allocate the SW queue for user mode CQs
RDMA/cxgb3: Increase the max CQ depth
RDMA/cxgb3: Doorbell overflow avoidance and recovery
IB/core: Pack struct ib_device a little tighter
IB/ucm: Clean whitespace errors
IB/ucm: Increase maximum devices supported
IB/ucm: Use stack variable 'base' in ib_ucm_add_one
IB/ucm: Use stack variable 'devnum' in ib_ucm_add_one
IB/umad: Clean whitespace
IB/umad: Increase maximum devices supported
IB/umad: Use stack variable 'base' in ib_umad_init_port
IB/umad: Use stack variable 'devnum' in ib_umad_init_port
IB/umad: Remove port_table[]
IB/umad: Convert *cdev to cdev in struct ib_umad_port
...
The iscsi_eh_target_reset has been modified to attempt
target reset only. If it fails, then iscsi_eh_session_reset
will be called.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Jayamohan Kallickal <jayamohank@serverengines.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Instead of repeating the error unwinding steps in each place an error
can be detected, use the common idiom of gotos into an error flow.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We can reduce the number of IB interrupts from two interrupts per
srp_queuecommand() call to one by using separate CQs for send and
receive completions and processing send completions by polling every
time a TX IU is allocated.
Receive completion events still trigger an interrupt.
Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Apparently bogus mc address can break IPOIB multicast processing. Therefore
returning the check for addrlen back until this is resolved in bonding (I don't
see any other point from where mc address with non-dev->addr_len length can came
from).
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to the loop complexicity in nes_nic.c, I'm using char* to copy mc addresses
to it.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the iSER receive completion flow takes the session lock
twice. Optimize it to avoid the first one by letting
iser_task_rdma_finalize() be called only from the cleanup_task
callback invoked by iscsi_free_task, thus reducing the contention on
the session lock between the scsi command submission to the scsi
command completion flows.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
libiscsi passthrough mode invokes the transport xmit calls directly
without first going through an internal queue, unlike the other mode,
which uses a queue and a xmitworker thread. Now that the "cant_sleep"
prerequisite of iscsi_host_alloc is met, move to use it. Handling
xmit errors is now done by the passthrough flow of libiscsi. Since
the queue/worker aren't used in this mode, the code that schedules the
xmitworker is removed.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>