Minor errors found via code inspection during future development.
SFF 8636 defines bit position 2 to hold the status indication of
QSFP memory paging. The mask used to test for the value was
incorrect and is fixed in this patch. Additionally, the dump
function had a mismatch between the field being printed out and
the field used to source the data which was fixed.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reported-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
struct qib_mr requires the mr member be the last because struct
qib_mregion contains a dynamic array at the end. The additions
of members should have been placed before this structure as the
comment noted.
Failure to do so was causing random memory corruption. Reproducing
this bug was easy to do by running the client and server of
ib_write_bw -s 8 -n 5 on the same node.
This BUG() was tripped in a slab debug kernel:
kernel BUG at mm/slab.c:2572!
Fixes: 38071a461f ("IB/qib: Support the new memory registration API")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Merge second patch-bomb from Andrew Morton:
- most of the rest of MM
- procfs
- lib/ updates
- printk updates
- bitops infrastructure tweaks
- checkpatch updates
- nilfs2 update
- signals
- various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
dma-debug, dma-mapping, ...
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
ipc,msg: drop dst nil validation in copy_msg
include/linux/zutil.h: fix usage example of zlib_adler32()
panic: release stale console lock to always get the logbuf printed out
dma-debug: check nents in dma_sync_sg*
dma-mapping: tidy up dma_parms default handling
pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
kexec: use file name as the output message prefix
fs, seqfile: always allow oom killer
seq_file: reuse string_escape_str()
fs/seq_file: use seq_* helpers in seq_hex_dump()
coredump: change zap_threads() and zap_process() to use for_each_thread()
coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
signals: kill block_all_signals() and unblock_all_signals()
nilfs2: fix gcc uninitialized-variable warnings in powerpc build
nilfs2: fix gcc unused-but-set-variable warnings
MAINTAINERS: nilfs2: add header file for tracing
nilfs2: add tracepoints for analyzing reading and writing metadata files
...
__GFP_WAIT was used to signal that the caller was in atomic context and
could not sleep. Now it is possible to distinguish between true atomic
context and callers that are not willing to sleep. The latter should
clear __GFP_DIRECT_RECLAIM so kswapd will still wake. As clearing
__GFP_WAIT behaves differently, there is a risk that people will clear the
wrong flags. This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly
indicate what it does -- setting it allows all reclaim activity, clearing
them prevents it.
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No ULP uses it anymore, go ahead and remove it.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Support the new memory registration API by allocating a
private page list array in qib_mr and populate it when
qib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating qib_fastreg_mr just take the needed information
from different places:
- page_size, iova, length (ib_mr)
- page array (qib_mr)
- key, access flags (ib_reg_wr)
The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch split up struct ib_send_wr so that all non-trivial verbs
use their own structure which embedds struct ib_send_wr. This dramaticly
shrinks the size of a WR for most common operations:
sizeof(struct ib_send_wr) (old): 96
sizeof(struct ib_send_wr): 48
sizeof(struct ib_rdma_wr): 64
sizeof(struct ib_atomic_wr): 96
sizeof(struct ib_ud_wr): 88
sizeof(struct ib_fast_reg_wr): 88
sizeof(struct ib_bind_mw_wr): 96
sizeof(struct ib_sig_handover_wr): 80
And with Sagi's pending MR rework the fast registration WR will also be
down to a reasonable size:
sizeof(struct ib_fastreg_wr): 64
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
Tested-by: Haggai Eran <haggaie@mellanox.com>
Tested-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct
structs should be constant.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the hfi1 driver was added these definitions were moved from the qib driver
to ib_mad.h to be used by both qib and hfi1. They should have been moved to
ib_smi.h instead.
Fixes: d4ab347005 ("IB/core: Add core header changes needed for OPA")
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch adds the value of the CNP opcode to the existing list of enumerated
opcodes in ib_pack.h
Add common OPA header definitions for driver
build:
- opa_port_info.h
- opa_smi.h
- hfi1_user.h
Additionally, ib_mad.h, has additional definitions
that are common to ib_drivers including:
- trap support
- cca support
The qib driver has the duplication removed in favor
those in ib_mad.h
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: John, Jubin <jubin.john@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The lkey table is allocated with with a get_user_pages() with an
order based on a number of index bits from a module parameter.
The underlying kernel code cannot allocate that many contiguous pages.
There is no reason the underlying memory needs to be physically
contiguous.
This patch:
- switches the allocation/deallocation to vmalloc/vfree
- caps the number of bits to 23 to insure at least 1 generation bit
o this matches the module parameter description
Cc: stable@vger.kernel.org
Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We recently added BUG_ON's which were inappropriate for a condition which
should never happen. Change these to be WARN_ON_ONCE as a debugging aid.
Fixes: 4cd7c9479a ('IB/mad: Add support for additional MAD info to/from drivers')
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"
[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
In order to support alternate sized MADs (and variable sized MADs on OPA
devices) add in/out MAD size parameters to the process_mad core call.
In addition, add an out_mad_pkey_index to communicate the pkey index the driver
wishes the MAD stack to use when sending OPA MAD responses.
The out MAD size and the out MAD PKey index are required by the MAD
stack to generate responses on OPA devices.
Furthermore, the in and out MAD parameters are made generic by specifying them
as ib_mad_hdr rather than ib_mad.
Drivers are modified as needed and are protected by BUG_ON flags if the MAD
sizes passed to them is incorrect.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add max MAD size to the device immutable data set and have all drivers that
support MADs report the current IB MAD size (IB_MGMT_MAD_SIZE) to the core.
Verify MAD size data in both the MAD core and when reading the immutable data.
OPA drivers will report alternate MAD sizes in subsequent patches.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In preparation to support the new OPA MAD Base version, add a base version
parameter to ib_create_send_mad and set it to IB_MGMT_BASE_VERSION for current
users.
Definition of the new base version and it's processing will occur in later
patches.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Vendors should be able to pass vendor specific data to/from
user-space via query_device uverb. In order to do this,
we need to pass the vendors' specific udata.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add a new ib_cq_init_attr structure which contains the
previous cqe (minimum number of CQ entries) and comp_vector
(completion vector) in addition to a new flags field.
All vendors' create_cq callbacks are changed in order
to work with the new API.
This commit does not change any functionality.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2
Signed-off-by: Doug Ledford <dledford@redhat.com>
The process_mad device function declares some parameters as "in". Make those
parameters const and adjust the call tree under process_mad in the various
drivers accordingly.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
As of commit 5eb620c81c "IB/core: Add helpers for uncached GID and P_Key
searches"; pkey_tbl_len and gid_tbl_len are immutable data which are stored in
the ib_device.
The per port core capability flags to be added later are also immutable data to
be stored in the ib_device object.
In preparation for this create a structure for per port immutable data and
place the pkey and gid table lengths within this structure.
"get_port_immutable" is added as a mandatory device function to allow the
drivers to fill in this data.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Commit d4988623cc ("IB/qib: use arch_phys_wc_add()")
adjusted mtrr inititialization to use the new interface.
Unfortunately, the new interface returns a signed
value and the patch tested the unsigned wc_cookie.
Fix the issue by changing the type of wc_cookie to int. For
the success case the ret left at zero to avoid
a warning from the caller. For failure wc_cookie
is used as the ret.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This driver already makes use of ioremap_wc() on PIO buffers,
so convert it to use arch_phys_wc_add().
The qib driver uses a mmap() special case for when PAT is
not used, this behaviour used to be determined with a
module parameter but since we have been asked to just
remove that module parameter this checks for the WC cookie,
if not set we can assume PAT was used. If its set we do
what we used to do for the mmap for when MTRR was enabled.
The removal of the module parameter is OK given that Andy
notes that even if users of module parameter are still around
it will not prevent loading of the module on recent kernels.
Cc: Doug Ledford <dledford@redhat.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: infinipath@intel.com
Cc: linux-rdma@vger.kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is no good reason not to, we eventually delete it as well.
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Mike Marciniszyn <infinipath@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Things Not To Do When Writing A Driver, part 1001st:
have writev() and write() on the same file doing completely
different things. As in, "interpret very different sets of
commands".
We _can_ handle that, but it's a bloody bad idea.
Don't do that in new drivers. Ever.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull more vfs updates from Al Viro:
"Assorted stuff from this cycle. The big ones here are multilayer
overlayfs from Miklos and beginning of sorting ->d_inode accesses out
from David"
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits)
autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation
procfs: fix race between symlink removals and traversals
debugfs: leave freeing a symlink body until inode eviction
Documentation/filesystems/Locking: ->get_sb() is long gone
trylock_super(): replacement for grab_super_passive()
fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
SELinux: Use d_is_positive() rather than testing dentry->d_inode
Smack: Use d_is_positive() rather than testing dentry->d_inode
TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR()
Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode
Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb
VFS: Split DCACHE_FILE_TYPE into regular and special types
VFS: Add a fallthrough flag for marking virtual dentries
VFS: Add a whiteout dentry type
VFS: Introduce inode-getting helpers for layered/unioned fs environments
Infiniband: Fix potential NULL d_inode dereference
posix_acl: fix reference leaks in posix_acl_create
autofs4: Wrong format for printing dentry
...
Upstream checkpatch now requires this.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Code that does this:
if (!(d_unhashed(tmp) && tmp->d_inode)) {
...
simple_unlink(parent->d_inode, tmp);
}
is broken because:
!(d_unhashed(tmp) && tmp->d_inode)
is equivalent to:
!d_unhashed(tmp) || !tmp->d_inode
so it is possible to get into simple_unlink() with tmp->d_inode == NULL.
simple_unlink(), however, assumes tmp->d_inode cannot be NULL.
I think that what was meant is this:
!d_unhashed(tmp) && tmp->d_inode
and that the logical-not operator or the final close-bracket was misplaced.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Bryan O'Sullivan <bos@pathscale.com>
cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The uses of "rcu_assign_pointer()" are NULLing out the pointers.
According to RCU_INIT_POINTER()'s block comment:
"1. This use of RCU_INIT_POINTER() is NULLing out the pointer"
it is better to use it instead of rcu_assign_pointer() because it has a
smaller overhead.
The following Coccinelle semantic patch was used:
@@
@@
- rcu_assign_pointer
+ RCU_INIT_POINTER
(..., NULL)
[Derived from http://marc.info/?l=linux-rdma&m=140836519219236&w=2]
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
According to RCU_INIT_POINTER()'s block comment 3.a, it can be used if
"1. This use of RCU_INIT_POINTER() is NULLing out the pointer"
it is better to use it instead of rcu_assign_pointer() because it has a
smaller overhead.
"3. The referenced data structure has already been exposed to readers either
at compile time or via rcu_assign_pointer() -and-
a. You have not made -any- reader-visible changes to this structure since
then".
These cases fulfill the conditions above because between the
rcu_dereference_protected() call and the rcu_assign_pointer() call
there is no update of that value. Therefore, this patch makes the
replacement.
The following Coccinelle semantic patch was used:
@@
@@
- rcu_assign_pointer
+ RCU_INIT_POINTER
(...,
(
rtnl_dereference(...)
|
rcu_dereference_protected(...)
) )
[consolidated from http://marc.info/?l=linux-rdma&m=140836578119485&w=2 and
http://marc.info/?l=linux-rdma&m=140906361403047&w=2]
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add support to recognize another board variation named QMH7360.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This changeset removes all the code that allows the driver to write to
the EEPROM and update the recorded error counters and power on hours.
These two stats are unused and writing them exposes a timing risk
which could leave the EEPROM in a bad state preventing further normal
operation of the HCA.
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In order to allow umems that do not pin memory, we need the umem to
keep track of its region's address.
This makes the offset field redundant, and so this patch removes it.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This particular reference count is not needed with the rcu protection,
and the current code leaks a reference count, causing a hang in
qib_qp_destroy().
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The static helper routine, __qib_get_user_pages(), accepts a vma arg,
but current use always passes NULL.
This has caused some confusion associated with the correct use of this
argument, but since the current use case doesn't require the
flexiblity, the best thing to do is to simplfy the code to always pass
NULL to get_user_pages().
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Miscellaneous
- Remove DEFINE_PCI_DEVICE_TABLE macro use (Benoit Taine)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT7PyAAAoJEFmIoMA60/r8kjQQALr/8oEfZoVcjgCb7waWOr25
hUTnrI6GBIAh/50hoBiPq0ouPCAKVv66+CUhuhFkLP7oJz+rMU0B9hfUvdLfmCpH
7ppaallkllT9nPFIr7h5RUWLXsoQyuHmCYmSrUCcnlT2LPgU0dN72YWElLisEM6Z
Pldg3933xyIQaCWviHjGEjWb7NvC+JY4pTkV5iyqGgU8Ale/eFYtLLSfdBEjIbGv
VDirYZmKELYeuncZPrTAsp4IENRMZn702wwDakMSODVMEWtJB5h4yrBawqQDlFP5
9ztIX6n9p9zkdVKbYZlx/Xwv6SYEnYXLxauVQMSO3Nck7Z10R5Ud+5uuCg/6mWH8
AQI4UV5bbJcg7zHgocTG9XLFLFPoPtD2JT6k6UT1LeUAiAOqcSzhRO+/qJBmJOWZ
Zv+EHXPlxBrl0zNifut6ZQrY17teuItVtmha70a/9W3PjnIx3KecqLcTwdTvDsOY
IAyH8WMZrBKpPpsczSmfE93i2Z1QRS91HEAOeSMxl/98dcDTdllYZS7spjoDll2f
xmpGDbpriLSCu2XsGHfTC9RbqA7CyuFlHggJSQDkT/5Esli0sCs7eweTuK3RVvPu
t6bUHK3yElb6x9qMZhb5q6l72wSMlGMishTdaxEHmqrEA8PtaIFodmVX2T/Zel5n
GHN6bysPqDItNR2v/3JX
=jJGu
-----END PGP SIGNATURE-----
Merge tag 'pci-v3.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull DEFINE_PCI_DEVICE_TABLE removal from Bjorn Helgaas:
"Part two of the PCI changes for v3.17:
- Remove DEFINE_PCI_DEVICE_TABLE macro use (Benoit Taine)
It's a mechanical change that removes uses of the
DEFINE_PCI_DEVICE_TABLE macro. I waited until later in the merge
window to reduce conflicts, but it's possible you'll still see a few"
* tag 'pci-v3.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use
We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to
meet kernel coding style guidelines. This issue was reported by checkpatch.
A simplified version of the semantic patch that makes this change is as
follows (http://coccinelle.lip6.fr/):
// <smpl>
@@
identifier i;
declarer name DEFINE_PCI_DEVICE_TABLE;
initializer z;
@@
- DEFINE_PCI_DEVICE_TABLE(i)
+ const struct pci_device_id i[]
= z;
// </smpl>
[bhelgaas: add semantic patch]
Signed-off-by: Benoit Taine <benoit.taine@lip6.fr>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Registrations options are specified through flags. Definitions of flags will
be in subsequent patches.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix the usnic and thw qib drivers to err when QP creation flags that
they don't understand are provided.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patches changes user visible function names containing "qlogic"
in module init and cleanup.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The code used a literal 1 in dispatching an IB_EVENT_PKEY_CHANGE.
As of the dual port qib QDR card, this is not necessarily correct.
Change to use the port as specified in the call.
Cc: <stable@vger.kernel.org>
Reported-by: Alex Estrin <alex.estrin@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
As result of the deprecation of the MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block(), all drivers using these
two interfaces need to be updated to use the new pci_enable_msi_range()
and pci_enable_msix_range() interfaces.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Remove the overload for .dma_len and .dma_address
The removal of these methods is compensated for by code changes to
.map_sg to insure that the vanilla sg_dma_address() and sg_dma_len()
will do the same thing as the equivalent former ib_sg_dma_address()
and ib_sg_dma_len() calls into the drivers.
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Tested-by: Vinod Kumar <vinod.kumar@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Returning directly is easier to read than do-nothing gotos. Remove the
duplicative check on "olp" and pull the code in one indent level.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Improve performance by changing the behavour of the driver when all
SDMA descriptors are in use, and the processes adding new descriptors
are single- or multi-rail.
For single-rail processes, the driver will block the call and finish
posting all SDMA descriptors onto the hardware queue before returning
back to PSM. Repeated kernel calls are slower than blocking.
For multi-rail processes, the driver will return to PSM as quick as
possible so PSM can feed packets to other rail. If all hardware
queues are full, PSM will buffer the remaining SDMA descriptors until
notified by interrupt that space is available.
This patch builds a red-black tree to track the number rails opened by
a particular PID. If the number is more than one, it is a multi-rail
PSM process, otherwise, it is a single-rail process.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: John A Gregor <john.a.gregor@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We already know "pusable" is non-zero, no need to check again.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In qib_create_ctxts() we allocate an array to hold recv contexts. Then attempt
to create data for those recv contexts. If that call to qib_create_ctxtdata()
fails then an error is returned but the previously allocated memory is not
freed.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit af061a644a add some code in qib_ib_rcv() which
trigger a warning from coccicheck (coccinelle/spatch):
$ make C=2 CHECK=scripts/coccicheck drivers/infiniband/hw/qib/
CHECK drivers/infiniband/hw/qib/qib_verbs.c
drivers/infiniband/hw/qib/qib_verbs.c:679:5-32: code aligned with following code on line 681
CC [M] drivers/infiniband/hw/qib/qib_verbs.o
In fact, according to similar code in qib_kreceive(),
qib_ib_rcv() code is correct but improperly indented.
This patch fix indentation for the misaligned portion.
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: infinipath@intel.com
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: cocci@systeme.lip6.fr
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit c804f07248 moved qib_assign_ctxt() to
do_qib_user_sdma_queue_create() but dropped the braces
around the statements.
This was spotted by coccicheck (coccinelle/spatch):
$ make C=2 CHECK=scripts/coccicheck drivers/infiniband/hw/qib/
CHECK drivers/infiniband/hw/qib/qib_file_ops.c
drivers/infiniband/hw/qib/qib_file_ops.c:1583:2-23: code aligned with following code on line 1587
This patch adds braces back.
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: infinipath@intel.com
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: cocci@systeme.lip6.fr
Cc: stable@vger.kernel.org
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The counters, unicast_xmit, unicast_rcv, multicast_xmit, multicast_rcv
are now maintained as percpu variables.
The mad code is modified to add a z_ latch so that the percpu counters
monotonically increase with appropriate adjustments in the reset,
read logic to maintain the z_ latch.
This patch also corrects the fact the unitcast_xmit wasn't handled
at all for UC and RC QPs.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch replaces the dd->int_counter with a percpu counter.
The maintanance of qib_stats.sps_ints and int_counter are
combined into the new counter.
There are two new functions added to read the counter:
- qib_int_counter (for a particular qib_devdata)
- qib_sps_ints (for all HCAs)
A z_int_counter is added to allow the interrupt detection logic
to determine if interrupts have occured since z_int_counter
was "reset".
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The debugfs init code was incorrectly called before the idr mechanism
is used to get the unit number, so the dd->unit hasn't been
initialized. This caused the unit relative directory creation to fail
after the first.
This patch moves the init for the debugfs stuff until after all of the
failures and after the unit number has been determined.
A bug in unwind code in qib_alloc_devdata() is also fixed.
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Guard against a potential buffer overrun. Right now the qib driver is
protected by the fact that the data structure in question is only 16
bits. Should that ever change the problem will be exposed. There is a
similar defect in the ipath driver and this brings the two code paths
into sync.
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch refactors the IB core umem code and vendor drivers to use a
linear (chained) SG table instead of chunk list. With this change the
relevant code becomes clearer—no need for nested loops to build and
use umem.
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Research has shown that commit a77fcf8950 ("IB/qib: Use a single
txselect module parameter for serdes tuning") missed a key serdes init
sequence.
This patch add that sequence.
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The GSI QP type is compatible with and should be allowed to send data
to/from any UD QP. This was found when testing ibacm on the same node
as an SA.
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.
When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.
Thus, those attributes were added to the following structures:
* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id
For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.
On the active side, the CM fills. its internal structures from the
path provided by the ULP. We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).
On the passive side, the CM fills its internal structures from the WC
associated with the REQ message. We add there taking the ETH L2
attributes from the WC.
When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.
ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB. Vendor drivers are modified to support the new
function signature.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- Re-enable flow steering verbs with new improved userspace ABI
- Fixes for slow connection due to GID lookup scalability
- IPoIB fixes
- Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
- Further improvements to SRP error handling
- Add new transport type for Cisco usNIC
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABCAAGBQJSil7BAAoJEENa44ZhAt0hbtgP/A+AmUalbOX6ZKzuOFxsrtY2
r55CX9b1JBeFM/Zhn2o6y+81lpCjkckJSggESMe4izNgocGw0nW4vYGN4SBynatj
y8sR9OSn+G3ihuENrzG41MJUGEa5WbcNMy4boN+Oa+qyTlV/WjLR7Fv4WbikK7Wm
o8FNlXiiDhMoGfHHG5J0MD0EQsnxuLDk2XP+ciu4tLtTs+wBka+gFK8WnMvztle3
gTeMNna5ilvCS2fdBxteuPA3KeDnJE9AgJSMJ2a4Rh+DR8uTgWYQ6n7amjmOc546
yhAKkoBkxPE10+Yj82WOPhCFxSeWcuSwJvpgv5dTVZ1XqUUcC1V3TEcZDHmyyHQ7
uPXgS1A+erBW3OYPBjZqtKvnHObscV12fL+rId3vIhcAQIbFroci08ZwPidEYRkn
fvwlEKcrIsBIpRXEyjlFCxsiiDnfq1wC1VayMR3jrIK0P6idf1SXf/geiRp9+RGT
wKUc0j51jvEx29qc65xuhEP9FQV9pCMxyd+FEE0d0KkjMz5hsIkjmcUcBbgF0CGg
GEyDPlgRLv+vmWDGpT8XraaV/0CJOEQDIgB4WSN87/AZ4UoNt7spW2xqsLsp1toy
5e0100tpWUleTPLe/Wig5GtBdagQ2jAUK1+186CP93pFPtkwc4/7X3hyp7qPIPTz
VDvT9DEy6zjSMCLpMcdo
=xxC+
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband/rdma updates from Roland Dreier:
- Re-enable flow steering verbs with new improved userspace ABI
- Fixes for slow connection due to GID lookup scalability
- IPoIB fixes
- Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
- Further improvements to SRP error handling
- Add new transport type for Cisco usNIC
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
IB/core: Re-enable create_flow/destroy_flow uverbs
IB/core: extended command: an improved infrastructure for uverbs commands
IB/core: Remove ib_uverbs_flow_spec structure from userspace
IB/core: Use a common header for uverbs flow_specs
IB/core: Make uverbs flow structure use names like verbs ones
IB/core: Rename 'flow' structs to match other uverbs structs
IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
IB/ucma: Convert use of typedef ctl_table to struct ctl_table
IB/cm: Convert to using idr_alloc_cyclic()
IB/mlx5: Fix page shift in create CQ for userspace
IB/mlx4: Fix device max capabilities check
IB/mlx5: Fix list_del of empty list
IB/mlx5: Remove dead code
IB/core: Encorce MR access rights rules on kernel consumers
IB/mlx4: Fix endless loop in resize CQ
RDMA/cma: Remove unused argument and minor dead code
RDMA/ucma: Discard events for IDs not yet claimed by user space
IB/core: Add Cisco usNIC rdma node and transport types
RDMA/nes: Remove self-assignment from nes_query_qp()
IB/srp: Report receive errors correctly
...
* lookup_one_len() really wants i_mutex held on directory.
* leaks galore - just mount ipathfs, then
cd /sys/bus/pci/drivers/qib_ib; echo *:*:*.* >unbind
on a box with that card present and try to umount ipathfs...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit 7fac33014f54("IB/qib: checkpatch fixes") was overzealous in
removing a simple_strtoul for a parse routine, setup_txselect(). That
routine is required to handle a multi-value string.
Unwind that aspect of the fix.
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Convert __attribute__ ((packed)) to __packed.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
qib_user_sdma_queue_pkts() gets called with mmap_sem held for
writing. Except for get_user_pages() deep down in
qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even
more interestingly the function qib_user_sdma_queue_pkts() (and also
qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
which can hit a page fault and we deadlock on trying to get mmap_sem
when handling that fault.
So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and
leave mmap_sem locking for mm.
This deadlock has actually been observed in the wild when the node
is under memory pressure.
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The callers of qib_tune_pcie_caps() and qib_tune_pcie_coalesce() don't
check the return values, so this patch drops the return values altogether.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Refactor qib_tune_pcie_caps(). Use pcie_get_mps(), pcie_set_mps(),
pcie_get_readrq(), and pcie_set_readrq() to simplify the code. The PCI
core caches the "PCIe Max Payload Size Supported" in pci_dev->pcie_mpss,
so use that instead of pcie_capability_read_word(). Remove the unused
val2fld() and fld2val().
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Use pci_is_root_bus() instead of "if (bus->parent)" statement
for better readability.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit 36a8f01cd2 ("IB/qib: Add congestion control agent
implementation") caused statements to leak pass the header guard.
Fix this.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
PCI core will initialize device MSI/MSI-X capability in
pci_msi_init_pci_dev(). So device drivers should use
pci_dev->msi_cap/msix_cap to determine whether a device supports
MSI/MSI-X instead of using pci_find_capability(pci_dev,
PCI_CAP_ID_MSI/MSIX). Access to PCIe device config space again will
consume more time.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
struct pci_driver qib_driver is only used in qib_init.c. Remove it
from qib.h and make it static in qib_init.c.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1. The code accepts chunks of messages, and splits the chunk into
packets when converting packets into sdma queue entries. Adjacent
packets will use user buffer pages smartly to avoid pinning the
same page multiple times.
2. Instead of discarding all the work when SDMA queue is full, the
work is saved in a pending queue. Whenever there are enough SDMA
queue free entries, pending queue is directly put onto SDMA queue.
3. An interrupt handler is used to progress this pending queue.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
[ Fixed up sparse warnings. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 0b3ddf380c ("Log all SDMA errors unconditionally") missed
part of the patch.
This also corrects a format warning when dma_addr_t is 32 bits
on a 64 bit system.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- AF_IB (native IB addressing) for CMA from Sean Hefty
- New mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes)
- SRP fixes from Bart Van Assche (including fix to first merge request)
- qib HW driver updates
- Resurrection of ocrdma HW driver development
- uverbs conversion to create fds with O_CLOEXEC set
- Other small changes and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJR30TKAAoJEENa44ZhAt0h854P/jvAhK5u+XTM5VyjAi0DKJ7P
bWcsu+KxbOIFnjEdsYQl1mGP44gdO8GPZp7+JR5nDHDRpw9K76qy6QQiPbaF6Y8D
cZH8Xlq4hzBfElTWBkExEemPrVUUq77j03FE9TBatdLAtEyYkgrNyqr7Ys6zVwVK
ugR8nAahvnB7Jh1tsyZBBd9kfbWtXJnaGC8/Zk3Na4n4zXRAbr0DcnRF0sncTL38
VFnWbi33OQAxu5bsb2jGec/SNP3BbNwspFPjSCKqiiItRaCj13JiHhrKKvVk4RZe
hIRnPH47kjLRp2/PwBo6o+gTXZuRg48VGBx4CKUTwx1nCzPPN1iz9ZOfqUv9Qwcv
LX8mxC7QS/Yvud4KeEBsj6kotb80EkRF2KV5RkIKCxQiwetGD9127bZylC8ttxGw
2f6MzYtAGD4R4C10lO8N+59VugSg1xAvwsqz0a/jy2XyVHbI1ugQedzkB20x5WPY
51S08ABvtU9yIxIYrw2VEaa/5WN+XJ6+LpG9QBAGXdMLiCiiAe7n/YzyXI6AgwaW
Jl/uKr6H6/jEHUHKwkyqsmbpVGPhtGWu8deyr1oYvOEP4i48gcDqMQsfMcCISrQV
MeQU3hS/obykUlNeqjmMI2CXrecqSsiq0hXd4DLaSoZ2Rb4Drx2Wj6sTQLIAgL2q
GBYjHWMUpZXIFHQaH7am
=nZh8
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
- AF_IB (native IB addressing) for CMA from Sean Hefty
- new mlx5 driver for Mellanox Connect-IB adapters (including post
merge request fixes)
- SRP fixes from Bart Van Assche (including fix to first merge request)
- qib HW driver updates
- resurrection of ocrdma HW driver development
- uverbs conversion to create fds with O_CLOEXEC set
- other small changes and fixes
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
mlx5: Return -EFAULT instead of -EPERM
IB/qib: Log all SDMA errors unconditionally
IB/qib: Fix module-level leak
mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec
IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
mlx5_core: Fixes for sparse warnings
IB/mlx5: Make profile[] static in main.c
mlx5: Fix parameter type of health_handler_t
mlx5: Add driver for Mellanox Connect-IB adapters
IB/core: Add reserved values to enums for low-level driver use
IB/srp: Bump driver version and release date
IB/srp: Make HCA completion vector configurable
IB/srp: Maintain a single connection per I_T nexus
IB/srp: Fail I/O fast if target offline
IB/srp: Skip host settle delay
IB/srp: Avoid skipping srp_reset_host() after a transport error
IB/srp: Fix remove_one crash due to resource exhaustion
IB/qib: New transmitter tunning settings for Dell 1.1 backplane
IB/core: Fix error return code in add_port()
...
This patch adds code to log SDMA errors for supportability purposes.
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The vzalloc()'ed field physshadow is leaked on module unload.
This patch adds vfree after the sibling page shadow is freed.
Reported-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Calling dev_set_name with a single paramter causes it to be handled as a
format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents,
including wrappers like device_create*() and bdi_register().
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Dell blade chassis got an updated backplane which requires new
transmitter tuning settings.
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This adds a seq_file iterator for reporting the QP hash table when the
qp_stats file is read.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch adds a debugfs stats interface for per kernel contexts
packet counts.
The code uses the opcode stats count and eliminates the counter in the
context.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This fix changes the opcode relative counters for receive to per
context.
Profiling has shown that when mulitple contexts are being used there
is a lot of cache activity associated with these counters.
The code formerly kept these counters per port, but only provided the
interface to read per HCA. This patch converts the read of counters
to per HCA and adds the debugfs hooks to be able to read the file as a
sequence of opcodes.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The current workqueue implemention has the following performance
deficiencies on QDR HCAs:
- The CQ call backs tend to run on the CPUs processing the
receive queues
- The single thread queue isn't optimal for multiple HCAs
This patch adds a dedicated per HCA bound thread to process CQ callbacks.
Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The driver currently selects a HCA based on the algorithm that PSM
chooses, contexts within a HCA or across. The HCA can also be chosen
by the user. Either way, this patch assigns a CPU on the NUMA node
local to the selected HCA. This patch also tries to select the HCA
closest to the NUMA node of the CPU assigned via taskset to PSM
process. If this HCA is unusable then another unit is selected based
on the algorithm that is currently enforced or selected by PSM - round
robin context selection 'within' or 'across' HCA's.
Fixed a bug wherein contexts are setup on the NUMA node on which the
processes are opened (setup_ctxt()) and not on the NUMA node that the
driver recommends the CPU on.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch adds context relative numa affinity conditioned on the
module parameter numa_aware. The qib_ctxtdata has an additional
node_id member and qib_create_ctxtdata() has an addition node_id
parameter.
The allocations within the hdr queue and eager queue setup routines
now take this additional member and adjust allocations as necesary.
PSM will pass the either current numa node or the node closest to the
HCA depending on numa_aware. Verbs will always use the node closest to
the HCA.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
External PSM repositories have advanced the minor number for a variety
of reasons. The driver needs to increase to avoid warnings.
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Follow Documentation/RCU/rcuref.txt guidance in removing
atomic_inc_not_zero() from QP RCU implementation.
This patch also removes an unneeded synchronize_rcu() in the add path.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch adds DCA cache warming for systems that support DCA.
The code uses cpu affinity notification to react to an affinity change
from a user mode program like irqbalance and (re-)program the chip
accordingly. This notification avoids reading the current cpu on every
interrupt.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
[ Add Kconfig dependency on SMP && GENERIC_HARDIRQS to avoid failure to
build due to undefined struct irq_affinity_notify. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>