Commit Graph

6698 Commits

Author SHA1 Message Date
Aditya Sarwade
b172679b0d RDMA/vmw_pvrdma: Activate device on ethernet link up
Restore device state when ethernet link changes to active.

Acked-by: George Zhang <georgezhang@vmware.com>
Acked-by: Jorgen Hansen <jhansen@vmware.com>
Acked-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 20:49:53 -04:00
Adit Ranadive
e51c2fb033 RDMA/vmw_pvrdma: Dont hardcode QP header page
Moved the header page count to a macro.

Reported-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Tested-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 20:49:53 -04:00
Adit Ranadive
6332dee83d RDMA/vmw_pvrdma: Cleanup unused variables
Removed the unused nreq and redundant index variables.
Moved hardcoded async and cq ring pages number to macro.

Reported-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Tested-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 20:49:53 -04:00
Jason Gunthorpe
cb88645596 infiniband: Fix alignment of mmap cookies to support VIPT caching
When vmalloc_user is used to create memory that is supposed to be mmap'd
to user space, it is necessary for the mmap cookie (eg the offset) to be
aligned to SHMLBA.

This creates a situation where all virtual mappings of the same physical
page share the same virtual cache index and guarantees VIPT coherence.
Otherwise the cache is non-coherent and the kernel will not see writes
by userspace when reading the shared page (or vice-versa).

Reported-by: Josh Beavers <josh.beavers@gmail.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 16:50:51 -04:00
Sagi Grimberg
86f46aba8d IB/core: Protect against self-requeue of a cq work item
We need to make sure that the cq work item does not
run when we are destroying the cq. Unlike flush_work,
cancel_work_sync protects against self-requeue of the
work item (which we can do in ib_cq_poll_work).

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>--
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 16:40:31 -04:00
Shiraz Saleem
871a8623d3 i40iw: Receive netdev events post INET_NOTIFIER state
Netdev notification events are de-registered only when all
client iwdev instances are removed. If a single client is closed
and re-opened, netdev events could arrive even before the Control
Queue-Pair (CQP) is created, causing a NULL pointer dereference crash
in i40iw_get_cqp_request. Fix this by allowing netdev event
notification only after we have reached the INET_NOTIFIER state with
respect to device initialization.

Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24 16:23:29 -04:00
Mintz, Yuval
be086e7c53 qed*: Utilize Firmware 8.15.3.0
This patch advances the qed* drivers into using the newer firmware -
This solves several firmware bugs, mostly related [but not limited to]
various init/deinit issues in various offloaded protocols.

It also introduces a major 4-Cached SGE change in firmware, which can be
seen in the storage drivers' changes.

In addition, this firmware is required for supporting the new QL41xxx
series of adapters; While this patch doesn't add the actual support,
the firmware contains the necessary initialization & firmware logic to
operate such adapters [actual support would be added later on].

Changes from Previous versions:
-------------------------------
 - V2 - fix kbuild-test robot warnings

Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13 15:33:09 -07:00
Ingo Molnar
0881e7bd34 sched/headers: Prepare to move the get_task_struct()/put_task_struct() and related APIs from <linux/sched.h> to <linux/sched/task.h>
But first update usage sites with the new header dependency.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:40 +01:00
Ingo Molnar
589ee62844 sched/headers: Prepare to remove the <linux/mm_types.h> dependency from <linux/sched.h>
Update code that relied on sched.h including various MM types for them.

This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:37 +01:00
Ingo Molnar
174cd4b1e5 sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00
Ingo Molnar
3f07c01441 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00
Ingo Molnar
6e84f31522 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h>
We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

   mm_alloc()
   __mmdrop()
   mmdrop()
   mmdrop_async_fn()
   mmdrop_async()
   mmget_not_zero()
   mmput()
   mmput_async()
   get_task_mm()
   mm_access()
   mm_release()

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:28 +01:00
Ingo Molnar
0c98d344fe sched/core: Remove the tsk_cpus_allowed() wrapper
So the original intention of tsk_cpus_allowed() was to 'future-proof'
the field - but it's pretty ineffectual at that, because half of
the code uses ->cpus_allowed directly ...

Also, the wrapper makes the code longer than the original expression!

So just get rid of it. This also shrinks <linux/sched.h> a bit.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:24 +01:00
Linus Torvalds
86292b33d4 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - a few MM remainders

 - misc things

 - autofs updates

 - signals

 - affs updates

 - ipc

 - nilfs2

 - spelling.txt updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (78 commits)
  mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear()
  mm: add arch-independent testcases for RODATA
  hfs: atomically read inode size
  mm: clarify mm_struct.mm_{users,count} documentation
  mm: use mmget_not_zero() helper
  mm: add new mmget() helper
  mm: add new mmgrab() helper
  checkpatch: warn when formats use %Z and suggest %z
  lib/vsprintf.c: remove %Z support
  scripts/spelling.txt: add some typo-words
  scripts/spelling.txt: add "followings" pattern and fix typo instances
  scripts/spelling.txt: add "therfore" pattern and fix typo instances
  scripts/spelling.txt: add "overwriten" pattern and fix typo instances
  scripts/spelling.txt: add "overwritting" pattern and fix typo instances
  scripts/spelling.txt: add "deintialize(d)" pattern and fix typo instances
  scripts/spelling.txt: add "disassocation" pattern and fix typo instances
  scripts/spelling.txt: add "omited" pattern and fix typo instances
  scripts/spelling.txt: add "explictely" pattern and fix typo instances
  scripts/spelling.txt: add "applys" pattern and fix typo instances
  scripts/spelling.txt: add "configuartion" pattern and fix typo instances
  ...
2017-02-27 23:09:29 -08:00
Linus Torvalds
f7878dc3a9 Merge branch 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Several noteworthy changes.

   - Parav's rdma controller is finally merged. It is very straight
     forward and can limit the abosolute numbers of common rdma
     constructs used by different cgroups.

   - kernel/cgroup.c got too chubby and disorganized. Created
     kernel/cgroup/ subdirectory and moved all cgroup related files
     under kernel/ there and reorganized the core code. This hurts for
     backporting patches but was long overdue.

   - cgroup v2 process listing reimplemented so that it no longer
     depends on allocating a buffer large enough to cache the entire
     result to sort and uniq the output. v2 has always mangled the sort
     order to ensure that users don't depend on the sorted output, so
     this shouldn't surprise anybody. This makes the pid listing
     functions use the same iterators that are used internally, which
     have to have the same iterating capabilities anyway.

   - perf cgroup filtering now works automatically on cgroup v2. This
     patch was posted a long time ago but somehow fell through the
     cracks.

   - misc fixes asnd documentation updates"

* 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (27 commits)
  kernfs: fix locking around kernfs_ops->release() callback
  cgroup: drop the matching uid requirement on migration for cgroup v2
  cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy
  cgroup: misc cleanups
  cgroup: call subsys->*attach() only for subsystems which are actually affected by migration
  cgroup: track migration context in cgroup_mgctx
  cgroup: cosmetic update to cgroup_taskset_add()
  rdmacg: Fixed uninitialized current resource usage
  cgroup: Add missing cgroup-v2 PID controller documentation.
  rdmacg: Added documentation for rdmacg
  IB/core: added support to use rdma cgroup controller
  rdmacg: Added rdma cgroup controller
  cgroup: fix a comment typo
  cgroup: fix RCU related sparse warnings
  cgroup: move namespace code to kernel/cgroup/namespace.c
  cgroup: rename functions for consistency
  cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c
  cgroup: separate out cgroup1_kf_syscall_ops
  cgroup: refactor mount path and clearly distinguish v1 and v2 paths
  cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c
  ...
2017-02-27 21:41:08 -08:00
Vegard Nossum
f1f1007644 mm: add new mmgrab() helper
Apart from adding the helper function itself, the rest of the kernel is
converted mechanically using:

  git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)->mm_count);/mmgrab\(\1\);/'
  git grep -l 'atomic_inc.*mm_count' | xargs sed -i 's/atomic_inc(&\(.*\)\.mm_count);/mmgrab\(\&\1\);/'

This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.

(Michal Hocko provided most of the kerneldoc comment.)

Link: http://lkml.kernel.org/r/20161218123229.22952-1-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:48 -08:00
Masahiro Yamada
608595ed9b scripts/spelling.txt: add "therfore" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  therfore||therefore

Besides, tidy up comment blocks for 80-col wrapping.

Link: http://lkml.kernel.org/r/1481573103-11329-31-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:47 -08:00
Masahiro Yamada
b8a14f3379 scripts/spelling.txt: add "overwriten" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  overwrien||overwritten

Link: http://lkml.kernel.org/r/1481573103-11329-30-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:47 -08:00
Linus Torvalds
ac1820fb28 This is a tree wide change and has been kept separate for that reason.
Bart Van Assche noted that the ib DMA mapping code was significantly
 similar enough to the core DMA mapping code that with a few changes
 it was possible to remove the IB DMA mapping code entirely and
 switch the RDMA stack to use the core DMA mapping code.  This resulted
 in a nice set of cleanups, but touched the entire tree.  This branch
 will be submitted separately to Linus at the end of the merge window
 as per normal practice for tree wide changes like this.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYo06oAAoJELgmozMOVy/d9Z8QALedWHdu98St1L0u2c8sxnR9
 2zo/4sF5Vb9u7FpmdIX32L4SQ9s9KhPE8Qp8NtZLf9v10zlDebIRJDpXknXtKooV
 CAXxX4sxBXV27/UrhbZEfXiPrmm6ccJFyIfRnMU6NlMqh2AtAsRa5AC2/RMp8oUD
 Med97PFiF0o6TD22/UH1VFbRpX1zjaKyqm7a3as5sJfzNA+UGIZAQ7Euz8000DKZ
 xCgVLTEwS0FmOujtBkCst7xa9TjuqR1HLOB4DdGvAhP6BHdz2yamM7Qmh9NN+NEX
 0BtjsuXomtn6j6AszGC+bpipCZh3NUigcwoFAARXCYFHibBvo4DPdFeGsraFgXdy
 1+KyR8CCeQG3Aly5Vwr264RFPGkGpwMj8PsBlXgQVtrlg4rriaCzOJNmIIbfdADw
 ftqhxBOzReZw77aH2s+9p2ILRfcAmPqhynLvFGFo9LBvsik8LVso7YgZN0xGxwcI
 IjI/XGC8UskPVsIZBIYA6sl2bYzgOjtBIHiXjRrPlW3uhduIXLrvKFfLPP/5XLAG
 ehLXK+J0bfsyY9ClmlNS8oH/WdLhXAyy/KNmnj5bRRm9qg6BRJR3bsOBhZJODuoC
 XgEXFfF6/7roNESWxowff7pK0rTkRg/m/Pa4VQpeO+6NWHE7kgZhL6kyIp5nKcwS
 3e7mgpcwC+3XfA/6vU3F
 =e0Si
 -----END PGP SIGNATURE-----

Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma DMA mapping updates from Doug Ledford:
 "Drop IB DMA mapping code and use core DMA code instead.

  Bart Van Assche noted that the ib DMA mapping code was significantly
  similar enough to the core DMA mapping code that with a few changes it
  was possible to remove the IB DMA mapping code entirely and switch the
  RDMA stack to use the core DMA mapping code.

  This resulted in a nice set of cleanups, but touched the entire tree
  and has been kept separate for that reason."

* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
  IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
  IB/core: Remove ib_device.dma_device
  nvme-rdma: Switch from dma_device to dev.parent
  RDS: net: Switch from dma_device to dev.parent
  IB/srpt: Modify a debug statement
  IB/srp: Switch from dma_device to dev.parent
  IB/iser: Switch from dma_device to dev.parent
  IB/IPoIB: Switch from dma_device to dev.parent
  IB/rxe: Switch from dma_device to dev.parent
  IB/vmw_pvrdma: Switch from dma_device to dev.parent
  IB/usnic: Switch from dma_device to dev.parent
  IB/qib: Switch from dma_device to dev.parent
  IB/qedr: Switch from dma_device to dev.parent
  IB/ocrdma: Switch from dma_device to dev.parent
  IB/nes: Remove a superfluous assignment statement
  IB/mthca: Switch from dma_device to dev.parent
  IB/mlx5: Switch from dma_device to dev.parent
  IB/mlx4: Switch from dma_device to dev.parent
  IB/i40iw: Remove a superfluous assignment statement
  IB/hns: Switch from dma_device to dev.parent
  ...
2017-02-25 13:45:43 -08:00
Dave Jiang
11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Linus Torvalds
af17fe7a63 Mellanox specific updates for 4.11 merge window
Because the Mellanox code required being based on a net-next tree,
 I keept it separate from the remainder of the RDMA stack submission
 that is based on 4.10-rc3.
 
 This branch contains:
 
 - Various mlx4 and mlx5 fixes and minor changes
 - Support for adding a tag match rule to flow specs
 - Support for cvlan offload operation for raw ethernet QPs
 - A change to the core IB code to recognize raw eth capabilities and
   enumerate them (touches non-Mellanox code)
 - Implicit On-Demand Paging memory registration support
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYrx+WAAoJELgmozMOVy/du70P/1kpW2xY9Le04c3K7na2XOYl
 AUVIDrW/8Go63tpOaM7jBT3k4GlwVFr3IOmBpS24KbW/THxjhyUeP5L5+z2x+go+
 jkQOgtPWWEHr5zP3MzsNyB8fDx1YQOnJwEXxybQRW/cbw4CLjnhP+ezd6FdV/3Yy
 pPEqDVlAErzvNweG+n2r1pjcUbR8uneC3inyMLnyzUBz4CHKmC8fgD3/qJIM+DNb
 gtFT5xHFIXKCigWdQ/EwsTDcHub43V8OXlI5sO7loG6vToOUATMkjI4oOUNhDmYS
 X7XLN3yRK9QHEfb5kutXIZEWzTGh7LiFtUYGaNNYqqzDfSiMRc9NC5kTOfplEXDV
 Uo+AGb6Fh1zYIOzNk7o+tazIv3LaLv6+Fcm+9bbe0VUIqasaylsePqaTwMuIzx/I
 xP5nitmd5lbYo8WdlasVdG6mH1DlJEUbU30v4DpmTpxCP6jGpog7lexyGyF3TgzS
 NhnG0IiIClWh3WQ2/GdsFK/obIdFkpLeASli1hwD81vzPfly9zc2YpgqydZI3WCr
 q6hTXYnANcP6+eciCpQPO7giRdXdiKey08Uoq/2jxb7Qbm4daG6UwopjvH9/lm1F
 m6UDaDvzNYm+Rx+bL/+KSx9JO9+fJB1L51yCmvLGpWi6yJI4ZTfanHNMBsCua46N
 Kev/DSpIAzX1WOBkte+a
 =rspQ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull Mellanox rdma updates from Doug Ledford:
 "Mellanox specific updates for 4.11 merge window

  Because the Mellanox code required being based on a net-next tree, I
  keept it separate from the remainder of the RDMA stack submission that
  is based on 4.10-rc3.

  This branch contains:

   - Various mlx4 and mlx5 fixes and minor changes

   - Support for adding a tag match rule to flow specs

   - Support for cvlan offload operation for raw ethernet QPs

   - A change to the core IB code to recognize raw eth capabilities and
     enumerate them (touches non-Mellanox code)

   - Implicit On-Demand Paging memory registration support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits)
  IB/mlx5: Fix configuration of port capabilities
  IB/mlx4: Take source GID by index from HW GID table
  IB/mlx5: Fix blue flame buffer size calculation
  IB/mlx4: Remove unused variable from function declaration
  IB: Query ports via the core instead of direct into the driver
  IB: Add protocol for USNIC
  IB/mlx4: Support raw packet protocol
  IB/mlx5: Support raw packet protocol
  IB/core: Add raw packet protocol
  IB/mlx5: Add implicit MR support
  IB/mlx5: Expose MR cache for mlx5_ib
  IB/mlx5: Add null_mkey access
  IB/umem: Indicate that process is being terminated
  IB/umem: Update on demand page (ODP) support
  IB/core: Add implicit MR flag
  IB/mlx5: Support creation of a WQ with scatter FCS offload
  IB/mlx5: Enable QP creation with cvlan offload
  IB/mlx5: Enable WQ creation and modification with cvlan offload
  IB/mlx5: Expose vlan offloads capabilities
  IB/uverbs: Enable QP creation with cvlan offload
  ...
2017-02-23 11:27:49 -08:00
Linus Torvalds
4cc4b9323f First set of updates for 4.11 kernel merge window
- Add new Broadcom bnxt_re RoCE driver
 - rxe driver updates
 - ioctl cleanups
 - ETH_P_IBOE declaration cleanup
 - IPoIB changes
 - Add port state cache
 - Allow srpt driver to accept guids as port names in config
 - Update to hfi1 driver
 - Update to srp driver
 - Lots of misc. minor changes all over
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYrfewAAoJELgmozMOVy/dFnEP/2Qe7NqXRqxLS0ZqsQseFHgQ
 jd236E7R/XtQQTE3PTcrWL0mq0DRF6tMEjfhUASKTbZVfCBTniJAoXYrvWhN/STq
 LxAdigdV/0SPbxO3r9B1Xvk2v5BySaIBkaUDvcEXzT4e7UVQwZgxDkhhsYeY0Z/r
 9bNB5760PzW8uO5cctXccNcWztZnW0IUZuAHVfQCPjZ7svoGwLnNDW6YQx+FsEkW
 tbPdzMXX8VKHlC5RcKbfOOBjdNyrUpWl+uvWEc/7mazKscp4yKVFZL7PcxqPJSfd
 aKdfqXYawhjZZpyws8Kn0rhkfT7xWKD/y9G5STykRJPj9/n1BDScFkmyDQhtP5bJ
 GANzdgH0z7Dt9LkcAs86A8EVBbIdbdT2cpPVu7t0uWEIsJw/O5ThKpgjnrrTm6m+
 89tgqLZooifTEsdj4UkZoyktrD3J9LSNZkgVmWtRn01W3oYFOPbdM4TmBZtg+/Yl
 VGmOJEHMEsNuJBcJcOuRJ1MVz2LebXmPUcB0RXzgmHHgulZ/DqoOtlpg5JNmJcr5
 wpw/yppkBop4V4+etJBlzDsZNmZZlX+AY0ZLqQJsDHNszDjwXgAy5Rn5FYIdMyk4
 ff0FKb5dzASSxHRDxAsu2uoGaREM0NkpA0UYiIZbepGLSO8PuFG2ScQ6qzU47vqu
 9SEzOaaQY2S2uqFFFnYp
 =ugNm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "First set of updates for 4.11 kernel merge window

   - Add new Broadcom bnxt_re RoCE driver
   - rxe driver updates
   - ioctl cleanups
   - ETH_P_IBOE declaration cleanup
   - IPoIB changes
   - Add port state cache
   - Allow srpt driver to accept guids as port names in config
   - Update to hfi1 driver
   - Update to srp driver
   - Lots of misc minor changes all over"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (114 commits)
  RDMA/bnxt_re: fix for "bnxt_en: Update to firmware interface spec 1.7.0."
  rdma_cm: fail iwarp accepts w/o connection params
  IB/srp: Drain the send queue before destroying a QP
  IB/core: Add support for draining IB_POLL_DIRECT completion queues
  IB/srp: Improve an error path
  IB/srp: Make a diagnostic message more informative
  IB/srp: Document locking conventions
  IB/srp: Fix race conditions related to task management
  IB/srp: Avoid that duplicate responses trigger a kernel bug
  IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
  RDMA/qedr: Fix some error handling
  RDMA/bnxt_re: add DCB dependency
  IB/hns: include linux/module.h
  IB/vmw_pvrdma: Expose vendor error to ULPs
  vmw_pvrdma: switch to pci_alloc_irq_vectors
  IB/hfi1: use size_t for passing array length
  IB/ipoib: Remove redudant label
  IB/ipoib: remove the unnecessary memory free
  IB/mthca: switch to pci_alloc_irq_vectors
  IB/hfi1: Code reuse with memdup_copy
  ...
2017-02-23 08:27:57 -08:00
Stephen Rothwell
db690328a7 RDMA/bnxt_re: fix for "bnxt_en: Update to firmware interface spec 1.7.0."
When the firmware interface spec was updated, a constant element was
renamed.  The rename missed the instances in the bnxt_re driver
because it wasn't upstream yet.  This updates the bnxt_re driver
with the rename.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-22 15:40:17 -05:00
Steve Wise
f2625f7db4 rdma_cm: fail iwarp accepts w/o connection params
cma_accept_iw() needs to return an error if conn_params is NULL.
Since this is coming from user space, we can crash.

Reported-by: Shaobo He <shaobo@cs.utah.edu>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-22 15:35:03 -05:00
Linus Torvalds
3051bf36c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
      Varadhan.

   2) Simplify classifier state on sk_buff in order to shrink it a bit.
      From Willem de Bruijn.

   3) Introduce SIPHASH and it's usage for secure sequence numbers and
      syncookies. From Jason A. Donenfeld.

   4) Reduce CPU usage for ICMP replies we are going to limit or
      suppress, from Jesper Dangaard Brouer.

   5) Introduce Shared Memory Communications socket layer, from Ursula
      Braun.

   6) Add RACK loss detection and allow it to actually trigger fast
      recovery instead of just assisting after other algorithms have
      triggered it. From Yuchung Cheng.

   7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.

   8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.

   9) Export MPLS packet stats via netlink, from Robert Shearman.

  10) Significantly improve inet port bind conflict handling, especially
      when an application is restarted and changes it's setting of
      reuseport. From Josef Bacik.

  11) Implement TX batching in vhost_net, from Jason Wang.

  12) Extend the dummy device so that VF (virtual function) features,
      such as configuration, can be more easily tested. From Phil
      Sutter.

  13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
      Dumazet.

  14) Add new bpf MAP, implementing a longest prefix match trie. From
      Daniel Mack.

  15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.

  16) Add new aquantia driver, from David VomLehn.

  17) Add bpf tracepoints, from Daniel Borkmann.

  18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
      Florian Fainelli.

  19) Remove custom busy polling in many drivers, it is done in the core
      networking since 4.5 times. From Eric Dumazet.

  20) Support XDP adjust_head in virtio_net, from John Fastabend.

  21) Fix several major holes in neighbour entry confirmation, from
      Julian Anastasov.

  22) Add XDP support to bnxt_en driver, from Michael Chan.

  23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.

  24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.

  25) Support GRO in IPSEC protocols, from Steffen Klassert"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
  Revert "ath10k: Search SMBIOS for OEM board file extension"
  net: socket: fix recvmmsg not returning error from sock_error
  bnxt_en: use eth_hw_addr_random()
  bpf: fix unlocking of jited image when module ronx not set
  arch: add ARCH_HAS_SET_MEMORY config
  net: napi_watchdog() can use napi_schedule_irqoff()
  tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
  net/hsr: use eth_hw_addr_random()
  net: mvpp2: enable building on 64-bit platforms
  net: mvpp2: switch to build_skb() in the RX path
  net: mvpp2: simplify MVPP2_PRS_RI_* definitions
  net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
  net: mvpp2: remove unused register definitions
  net: mvpp2: simplify mvpp2_bm_bufs_add()
  net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
  net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
  net: mvpp2: release reference to txq_cpu[] entry after unmapping
  net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
  net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
  net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
  ...
2017-02-22 10:15:09 -08:00
Linus Torvalds
cdc194705d SCSI misc on 20170220
This update includes the usual round of major driver updates (ncr5380,
 ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
 megaraid_sas, ).  There's also an assortment of minor fixes and the
 major update of switching a bunch of drivers to pci_alloc_irq_vectors
 from Christoph.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJYq5adAAoJEAVr7HOZEZN4bjUP/Atk7CSZVnC75pcYmncbEGCx
 ysOlEHK4uW2HhiAYk3PlYMk+pKrMHet2zsbbM9PHJfopdOHZ7Sq1+UZZVeqE1Zun
 8pe0NhON+fZx7XAnevdEvnSSULQZ+AGfjZO72iUwkJiN3ozYaFtCITOyn49l4GpR
 ra9emskBh7CQOFW2voGn1AKeDijPYGx3+TO4AUrWjVMiByR06gb1bmImx+ljiUrs
 jzRJPfrt90ORcTdpMateyN2EXxudcASMhX03SJ6fRI84hPAhMCROMbTv8RnzOTE4
 DPbnvbYUowlHt43iUhJHSwGdkRRaRBnkzQENBp1fNrNzZgF6vB7+kShxbonrYB2p
 gC4ewaJr0BNj+HsUnvTpe3WseiPOcfsnBsKilPLKBlm2dCKEXqFox/dj/T1uexxg
 HoyFrl3u8fyEqVHrzRS4M9t/njWh0NFmXxb0wBdj+lkVFTRErGSKQ8SfOqshuSGs
 P8NN88jy8vC7uqgzKBJ+UH3ehzn3qfBxasFHIC/e2awY9FqKjHGTxKMmSVpjXVxy
 wCvE2FQ3k/qEj2XSM6f7/NGytlSOlju5q1rFtHPW2M+TFSh0LJWCnmVjR/Zle9em
 pBWmtIgCv8W5b41zL2H94nLWAZbfdrrNU/XnX88l47LKnmorte/PGhpxu36NEsMS
 VCgreQmFMdMRY+WzDWl1
 =cBQx
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This update includes the usual round of major driver updates (ncr5380,
  ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
  megaraid_sas, ...).

  There's also an assortment of minor fixes and the major update of
  switching a bunch of drivers to pci_alloc_irq_vectors from Christoph"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (188 commits)
  scsi: megaraid_sas: handle dma_addr_t right on 32-bit
  scsi: megaraid_sas: array overflow in megasas_dump_frame()
  scsi: snic: switch to pci_irq_alloc_vectors
  scsi: megaraid_sas: driver version upgrade
  scsi: megaraid_sas: Change RAID_1_10_RMW_CMDS to RAID_1_PEER_CMDS and set value to 2
  scsi: megaraid_sas: Indentation and smatch warning fixes
  scsi: megaraid_sas: Cleanup VD_EXT_DEBUG and SPAN_DEBUG related debug prints
  scsi: megaraid_sas: Increase internal command pool
  scsi: megaraid_sas: Use synchronize_irq to wait for IRQs to complete
  scsi: megaraid_sas: Bail out the driver load if ld_list_query fails
  scsi: megaraid_sas: Change build_mpt_mfi_pass_thru to return void
  scsi: megaraid_sas: During OCR, if get_ctrl_info fails do not continue with OCR
  scsi: megaraid_sas: Do not set fp_possible if TM capable for non-RW syspdIO, change fp_possible to bool
  scsi: megaraid_sas: Remove unused pd_index from megasas_build_ld_nonrw_fusion
  scsi: megaraid_sas: megasas_return_cmd does not memset IO frame to zero
  scsi: megaraid_sas: max_fw_cmds are decremented twice, remove duplicate
  scsi: megaraid_sas: update can_queue only if the new value is less
  scsi: megaraid_sas: Change max_cmd from u32 to u16 in all functions
  scsi: megaraid_sas: set pd_after_lb from MR_BuildRaidContext and initialize pDevHandle to MR_DEVHANDLE_INVALID
  scsi: megaraid_sas: latest controller OCR capability from FW before sending shutdown DCMD
  ...
2017-02-21 11:51:42 -08:00
Linus Torvalds
42e1b14b6e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Implement wraparound-safe refcount_t and kref_t types based on
     generic atomic primitives (Peter Zijlstra)

   - Improve and fix the ww_mutex code (Nicolai Hähnle)

   - Add self-tests to the ww_mutex code (Chris Wilson)

   - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
     Bueso)

   - Micro-optimize the current-task logic all around the core kernel
     (Davidlohr Bueso)

   - Tidy up after recent optimizations: remove stale code and APIs,
     clean up the code (Waiman Long)

   - ... plus misc fixes, updates and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  fork: Fix task_struct alignment
  locking/spinlock/debug: Remove spinlock lockup detection code
  lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
  lkdtm: Convert to refcount_t testing
  kref: Implement 'struct kref' using refcount_t
  refcount_t: Introduce a special purpose refcount type
  sched/wake_q: Clarify queue reinit comment
  sched/wait, rcuwait: Fix typo in comment
  locking/mutex: Fix lockdep_assert_held() fail
  locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
  locking/rwsem: Reinit wake_q after use
  locking/rwsem: Remove unnecessary atomic_long_t casts
  jump_labels: Move header guard #endif down where it belongs
  locking/atomic, kref: Implement kref_put_lock()
  locking/ww_mutex: Turn off __must_check for now
  locking/atomic, kref: Avoid more abuse
  locking/atomic, kref: Use kref_get_unless_zero() more
  locking/atomic, kref: Kill kref_sub()
  locking/atomic, kref: Add kref_read()
  locking/atomic, kref: Add KREF_INIT()
  ...
2017-02-20 13:23:30 -08:00
Bart Van Assche
9294000d6d IB/srp: Drain the send queue before destroying a QP
A quote from the IB spec:

However, if the Consumer does not wait for the Affiliated Asynchronous
Last WQE Reached Event, then WQE and Data Segment leakage may occur.
Therefore, it is good programming practice to tear down a QP that is
associated with an SRQ by using the following process:
* Put the QP in the Error State;
* wait for the Affiliated Asynchronous Last WQE Reached Event;
* either:
  * drain the CQ by invoking the Poll CQ verb and either wait for CQ
    to be empty or the number of Poll CQ operations has exceeded CQ
    capacity size; or
  * post another WR that completes on the same CQ and wait for this WR to return as a WC;
* and then invoke a Destroy QP or Reset QP.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:55 -05:00
Bart Van Assche
f039f44fc3 IB/core: Add support for draining IB_POLL_DIRECT completion queues
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:55 -05:00
Bart Van Assche
b02c15360b IB/srp: Improve an error path
Avoid that the following message is printed if login fails:

scsi host0: ib_srp: Sending CM DREQ failed

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:54 -05:00
Bart Van Assche
a7139ca80f IB/srp: Make a diagnostic message more informative
Report the destination port GID if connecting fails.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:53 -05:00
Bart Van Assche
93c76dbb74 IB/srp: Document locking conventions
Use lockdep_assert_held() statements to verify at run-time
whether the proper locks are held.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:53 -05:00
Bart Van Assche
0a6fdbdeb1 IB/srp: Fix race conditions related to task management
Avoid that srp_process_rsp() overwrites the status information
in ch if the SRP target response timed out and processing of
another task management function has already started. Avoid that
issuing multiple task management functions concurrently triggers
list corruption. This patch prevents that the following stack
trace appears in the system log:

WARNING: CPU: 8 PID: 9269 at lib/list_debug.c:52 __list_del_entry_valid+0xbc/0xc0
list_del corruption. prev->next should be ffffc90004bb7b00, but was ffff8804052ecc68
CPU: 8 PID: 9269 Comm: sg_reset Tainted: G        W       4.10.0-rc7-dbg+ #3
Call Trace:
 dump_stack+0x68/0x93
 __warn+0xc6/0xe0
 warn_slowpath_fmt+0x4a/0x50
 __list_del_entry_valid+0xbc/0xc0
 wait_for_completion_timeout+0x12e/0x170
 srp_send_tsk_mgmt+0x1ef/0x2d0 [ib_srp]
 srp_reset_device+0x5b/0x110 [ib_srp]
 scsi_ioctl_reset+0x1c7/0x290
 scsi_ioctl+0x12a/0x420
 sd_ioctl+0x9d/0x100
 blkdev_ioctl+0x51e/0x9f0
 block_ioctl+0x38/0x40
 do_vfs_ioctl+0x8f/0x700
 SyS_ioctl+0x3c/0x70
 entry_SYSCALL_64_fastpath+0x18/0xad

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Steve Feeley <Steve.Feeley@sandisk.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:52 -05:00
Bart Van Assche
6cb72bc1b4 IB/srp: Avoid that duplicate responses trigger a kernel bug
After srp_process_rsp() returns there is a short time during which
the scsi_host_find_tag() call will return a pointer to the SCSI
command that is being completed. If during that time a duplicate
response is received, avoid that the following call stack appears:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: srp_recv_done+0x450/0x6b0 [ib_srp]
Oops: 0000 [#1] SMP
CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.10.0-rc7-dbg+ #1
Call Trace:
 <IRQ>
 __ib_process_cq+0x4b/0xd0 [ib_core]
 ib_poll_handler+0x1d/0x70 [ib_core]
 irq_poll_softirq+0xba/0x120
 __do_softirq+0xba/0x4c0
 irq_exit+0xbe/0xd0
 smp_apic_timer_interrupt+0x38/0x50
 apic_timer_interrupt+0x90/0xa0
 </IRQ>
RIP: srp_recv_done+0x450/0x6b0 [ib_srp] RSP: ffff88046f483e20

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Steve Feeley <Steve.Feeley@sandisk.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:52 -05:00
Bart Van Assche
d6c58dc40f IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS
Tests have shown that the following error message is reported when
using SG-GAPS registration with an mlx5 adapter:

scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bd4270eb0
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 0f007806 2500002a ad9fafd1
scsi host1: ib_srp: reconnect succeeded
mlx5_0:dump_cqe:262:(pid 7369): dump error cqe
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 0f007806 25000032 00105dd0
scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880b92860138

Hence avoid using SG-GAPS memory registrations. Additionally,
always configure the blk_queue_virt_boundary() to avoid to trigger
a mapping failure when using adapters that support SG-GAPS (e.g.
mlx5).

Fixes: commit ad8e66b4a8 ("IB/srp: fix mr allocation when the device supports sg gaps")
Fixes: commit 509c5f33f4 ("IB/srp: Prevent mapping failures")
Reported-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Mark Bloch <markb@mellanox.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Cc: <stable@vger.kernel.org> # 4.7+
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:51:41 -05:00
Christophe Jaillet
4cd33aafe4 RDMA/qedr: Fix some error handling
'qedr_alloc_pbl_tbl()' can not return NULL.

In qedr_init_user_queue():
 - simplify the test for the return value, no need to test for NULL
 - propagate the error pointer if needed, otherwise 0 (success) is returned.
   This is spurious.

In init_mr_info():
 - test the return value with IS_ERR
 - propagate the error pointer if needed instead of an exlictit -ENOMEM.
   This is a no-op as the only error pointer that we can have here is
   already -ENOMEM

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:27:29 -05:00
Arnd Bergmann
e57f774db1 RDMA/bnxt_re: add DCB dependency
When CONFIG_DCB is disabled, we get a link error:

drivers/infiniband/built-in.o: In function `bnxt_re_setup_qos':
trace.c:(.text+0x155774): undefined reference to `dcb_ieee_getapp_mask'
trace.c:(.text+0x155774): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `dcb_ieee_getapp_mask'
trace.c:(.text+0x155794): undefined reference to `dcb_ieee_getapp_mask'
trace.c:(.text+0x155794): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `dcb_ieee_getapp_mask'

Like the other drivers that use this function, a Kconfig dependency is
the correct fix.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:27:29 -05:00
Arnd Bergmann
3ecc16c82c IB/hns: include linux/module.h
I ran into a build error on arm64 randconfig testing:

infiniband/hw/hns/hns_roce_main.c:539:1: error: data definition has no type or storage class [-Werror]
infiniband/hw/hns/hns_roce_main.c:539:1: error: type defaults to 'int' in declaration of 'MODULE_DEVICE_TABLE' [-Werror=implicit-int]
infiniband/hw/hns/hns_roce_main.c:539:1: error: parameter names (without types) in function declaration [-Werror]
infiniband/hw/hns/hns_roce_main.c:979:226: error: data definition has no type or storage class [-Werror]
infiniband/hw/hns/hns_roce_main.c:979:226: error: type defaults to 'int' in declaration of 'module_init' [-Werror=implicit-int]
infiniband/hw/hns/hns_roce_main.c:979:1: error: parameter names (without types) in function declaration [-Werror]

Including the module.h makes it build again.

Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:27:28 -05:00
Yuval Shaia
c67294b70b IB/vmw_pvrdma: Expose vendor error to ULPs
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:27:28 -05:00
Christoph Hellwig
7bf3976d6c vmw_pvrdma: switch to pci_alloc_irq_vectors
.. and greatly clean up the irq handling boilerplate code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:27:20 -05:00
Arnd Bergmann
64b2ae74e8 IB/hfi1: use size_t for passing array length
gcc-7 produces a mysterious warning about the size argument being potentially out
of range:

drivers/infiniband/hw/hfi1/verbs.c: In function 'init_cntr_names':
drivers/infiniband/hw/hfi1/verbs.c:1644:2: error: 'memcpy': specified size between 18446744071562067968 and 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=]

This seems to refer to a the case where an 64-bit size_t gets truncated
into a negative 'int' and subsequently turned into a high 64-bit number
again.

The fix is clearly to use size_t here, which matches the type that gets
used for this value elsewhere.

Fixes: b7481944b0 ("IB/hfi1: Show statistics counters under IB stats interface")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:47 -05:00
Zhu Yanjun
23536dfaa3 IB/ipoib: Remove redudant label
There are 2 labels to mark the same statements. Replace the 2 labels
with one versatile labe.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:46 -05:00
Zhu Yanjun
c5e8f57b0b IB/ipoib: remove the unnecessary memory free
In the function ipoib_cm_nonsrq_init_rx, the memory is not
allocated successfully. It is not necessary to free it.

CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:46 -05:00
Christoph Hellwig
f50cccdd03 IB/mthca: switch to pci_alloc_irq_vectors
Trivial switch to the new API for this driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:45 -05:00
Michael J. Ruhl
1bb0d7b781 IB/hfi1: Code reuse with memdup_copy
Update several usages of kmalloc/user_copy to memdup_copy and
memdup_copy_nul.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:45 -05:00
Don Hiatt
832666c163 IB/hfi1, qib, rdmavt: Move AETH defines to rdma/ib_hdrs.h
Rename RVT AETH defines and export in rdma/ib_hdrs.h

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:44 -05:00
Don Hiatt
881fccb864 IB/hfi1: Add rvt_rnr_tbl_to_usec function
Return usec from an index into ib_rvt_rnr_table.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:44 -05:00
Michael J. Ruhl
db069ecb5d IB/hfi1: Do not set physical link state if DC is in the shutdown state
If the DC is in shutdown state, the set link state function will return
an error.  Since this is not a failure in this state, make sure to
only call set link state if the DC is on.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:43 -05:00
Jakub Byczkowski
c27aad00d1 IB/hfi1: Modify logging frequency of DCC errors
Use rate-limit state to limit number of messages logged
to kernel message buffer for DCC errors. Add new macro
dd_dev_info_ratelimited for that propose. Replace all
dd_dev_info calls in handle_dcc_err function with
rate-limited version.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:43 -05:00
Mike Marciniszyn
f9215b5e53 IB/rdmavt, IB/hfi1, IB/qib: Correct ack count for passive (RTR) QPs
The send complete for RC QPs mismanages the ack count when the
responder side is only in RTR.

A QP in that state cannot send requests, but it can be the target
for operations that elicit responses.

Adjust the RC completion logic to correct the count maintenance
by reflecting RECV_OK in a new state test.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:42 -05:00
Brian Welty
3fc4a0906f IB/qib: Updates to use rdmavt's SGE helper routines
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:42 -05:00
Brian Welty
1198fcea8a IB/hfi1, rdmavt: Move SGE state helper routines into rdmavt
To improve code reuse, add small SGE state helper routines to rdmavt_mr.h.
Leverage these in hfi1, including refactoring of hfi1_copy_sge.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:41 -05:00
Brian Welty
0128fceaf9 IB/hfi1, rdmavt: Update copy_sge to use boolean arguments
Convert copy_sge and related SGE state functions to use boolean.
For determining if QP is in user mode, add helper function in rdmavt_qp.h.
This is used to determine if QP needs the last byte ordering.
While here, change rvt_pd.user to a boolean.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:41 -05:00
Venkata Sandeep Dhanalakota
b4238e7057 IB/qib: Use new rdmavt timers
Reduce qib code footprint by using the rdmavt timers.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:40 -05:00
Venkata Sandeep Dhanalakota
56acbbfb46 IB/hfi1: Use new rdmavt timers
Reduce hfi1 code footprint by using the rdmavt timers.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:39 -05:00
Venkata Sandeep Dhanalakota
11a10d4bc7 IB/rdmavt: Adding timer logic to rdmavt
To move common code across target to rdmavt for code reuse.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:39 -05:00
Brian Welty
696513e8cf IB/hfi1, qib, rdmavt: Move AETH credit functions into rdmavt
Add rvt_compute_aeth() and rvt_get_credit() as shared functions in
rdmavt, moved from hfi1/qib logic.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:38 -05:00
Brian Welty
beb5a04267 IB/hfi1, qib, rdmavt: Move two IB event functions into rdmavt
Add rvt_rc_error() and rvt_comm_est() as shared functions in
rdmavt, moved from hfi1/qib logic.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:38 -05:00
Sebastian Sanchez
c03c08d50b IB/hfi1: Check upper-case EFI variables
The EFI variable that provides board ID is named
by the PCI address of the device, which is published
in upper-case, while the HFI1 driver reads the EFI
variable in lower-case.
This prevents returning the correct board id when
queried through sysfs. Read EFI variables in
upper-case if the lower-case read fails.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:37 -05:00
Sebastian Sanchez
76327627be IB/hfi1: Reduce oversized fields in struct hfi1_packet
Some fields in struct hfi1_packet are oversized.
Reduce them.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:37 -05:00
Mike Marciniszyn
d7c76e91aa IB/hfi1: Add additional fields to qp_stats
The r_psn and s_rnr_retry are missing.

Add with this patch.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:36 -05:00
Sebastian Sanchez
b448bf9a0d IB/hfi1: Allocate context data on memory node
There are some memory allocation calls in hfi1_create_ctxtdata()
that do not use the numa function parameter. This
can cause cache lines to be filled over QPI.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:36 -05:00
Sebastian Sanchez
338adfdddf IB/rdmavt: Use per-CPU reference count for MRs
Having per-CPU reference count for each MR prevents
cache-line bouncing across the system. Thus, it
prevents bottlenecks. Use per-CPU reference counts
per MR.

The per-CPU reference count for FMRs is used in
atomic mode to allow accurate testing of the busy
state. Other MR types run in per-CPU mode MR until
they're freed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:35 -05:00
Sebastian Sanchez
f3e862cb68 IB/hfi1: Access hfi1_ibport through rcd pointer
Receive code paths use the QP's device and port
number to access the struct hfi1_ibport. When an
instance of struct hfi1_ctxtdata is present, it can
be used to access struct hfi1_ibport through a pointer.
This makes struct hfi1_ibport lookup time faster as an
array doesn't have to be indexed and access fields in
other cache-lines.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:35 -05:00
Mike Marciniszyn
a8715b97d6 IB/hfi1: Correct error calldown locking
The resource specific wait locking missed correcting the lock
for the notify_error_qp() calldown.

The code is fixed to correctly use the iowait lock field to protect
the head that is protected by that lock.

Fixes: Commit 4e045572e2 ("IB/hfi1: Add unique txwait_lock for txreq events")
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:34 -05:00
Easwar Hariharan
39e2afa8d0 IB/hfi1: Use static CTLE with Preset 6 for integrated HFIs
After extended testing, it was found that the previous PCIe Gen
3 recipe, which used adaptive CTLE with Preset 4, could cause an
NMI/Surprise Link Down in about 1 in 100 to 1 in 1000 power cycles on
some platforms. New EV data combined with extensive empirical data
indicates that the new recipe should use static CTLE with Preset 6 for
all integrated silicon SKUs.

Fixes: c3f8de0b33 ("IB/hfi1: Add static PCIe Gen3 CTLE tuning")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:34 -05:00
Mike Marciniszyn
eb04ff09d8 IB/hfi1: Ensure read of producer s_head is correct
The read of s_head in the hfi1_make_rc_req() and
qib_make_rc_req() lack the necesary barrier instuctions.

Correct other ACCESS_ONCE() warnings in the same file.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:33 -05:00
Mike Marciniszyn
a82a7fcd1f IB/hfi1: Process qp wait list in IRQ thread periodically
In the event that the IRQ thread is extremely busy, the
processing of an rcd wait list can be delayed by quite
a bit until the IRQ thread completes its work.

The QP reset reference count wait can then appear to be stuck, thus
causing up a QP destroy to emit the hung task diagnostic.

Fix by processing the qp wait list periodically from the thread.  The
interval is a multiple (currently 4) of the MAX_PKT_RECV.

Also, reduce some of the excessive inlining.   The guidelines
are per packet is ok inline, otherwise the choice is based on
likelyhood of execution.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:32 -05:00
Mike Marciniszyn
4fcf1de5a7 IB/hfi1: Correct defered count after processing qp_wait_list
The qp_wait_list processing leaves the defered ack count
at its prior value.

This can result in a premature send of an ack.

Fixed by unconditionally reseting the defered ack count
in hfi1_send_rc_ack().

Fixes: Commit 7c091e5c06 ("staging/rdma/hfi1: add ACK coalescing logic")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:32 -05:00
Wei Yongjun
8d8a473380 IB/rxe: use setup_timer to simplify the code
Use setup_timer function instead of initializing timer with the function
and data fields.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:31 -05:00
Max Gurtovoy
32f8e839ed IB/iser: Protect completion context active_qps update
As iser connections can share completion contexts, we need
to protect the active_qps update each time we set it.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:31 -05:00
Ganesh Goudar
192539f4ce iw_cxgb4: clean up send_connect()
Clean up send_connect() and make use of t6 specific
active open request struct.

Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Bharat Teja <bharat@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19 09:18:30 -05:00
Doug Ledford
6dd7abae71 Merge branch 'k.o/for-4.10-rc' into HEAD 2017-02-19 09:18:21 -05:00
Moni Shoua
6df6b4a9ce IB/cma: Destination and source addr families must match
The destination address in a listening rdma_id does not have an address
family. Since address family in both sides of a connection must be the
same in rdma_bind_addr() we set the address family of the destination to
the address family of the source.

This patch serves the logic in cma_port_is_unique() which requires to
know if destination address that is associated with a rdma_id is any address
(cma_zero_addr() and cma_loopback_addr()).

This can happen when port reuse is checked for a port number
that is being listened to.

Fixes: 19b752a19d ("IB/cma: Allow port reuse for rdma_id")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-15 09:51:33 -05:00
Majd Dibbiny
89052d784b IB/cma: Add default RoCE TOS to CMA configfs
Add new entry to the RDMA-CM configfs that allows users
to select default TOS for RDMA-CM QPs.

This is useful for users that want to control the TOS for legacy
applications without changing their code.

Application that sets the TOS explicitly using the rdma_set_option
API will continue to work as expected, meaning overriding the configfs
value.

CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-15 09:51:28 -05:00
Parav Pandit
5903960840 IB/core: Remove pointer casting from void to net_device
This patch avoids unnecessary type casting from void to net_device.

CC: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-15 09:51:28 -05:00
Erez Shitrit
2b0841766a IB/IPoIB: Add destination address when re-queue packet
When sending packet to destination that was not resolved yet
via path query, the driver keeps the skb and tries to re-send it
again when the path is resolved.

But when re-sending via dev_queue_xmit the kernel doesn't call
to dev_hard_header, so IPoIB needs to keep 20 bytes in the skb
and to put the destination address inside them.

In that way the dev_start_xmit will have the correct destination,
and the driver won't take the destination from the skb->data, while
nothing exists there, which causes to packet be be dropped.

The test flow is:
1. Run the SM on remote node,
2. Restart the driver.
4. Ping some destination,
3. Observe that first ICMP request will be dropped.

Fixes: fc791b6335 ("IB/ipoib: move back IB LL address into the hard header")
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-15 09:51:28 -05:00
Eli Cohen
cdbe33d0f8 IB/mlx5: Fix configuration of port capabilities
When the "ib_virt" cap is set, configuration of port capabilities need
to be done through mlx5_core_modify_hca_vport_context.
Since modify_hca_vport_context accepts mask and value, there is no need
to read the port capabilities and calculate the new cap values so we
avoid the mutex when ib_virt is set.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-15 09:29:37 -05:00
Talat Batheesh
a748d60df3 IB/mlx4: Take source GID by index from HW GID table
Previously, we used the HW GID index in order to search the source GID
in the software GID cached table. In some cases, for example when
the MAC Address of the network interface is changed, the GID cached table
saves the old-IPv6-link-local GID at the end of the table.

When returning the old MAC address, the software GID cached table tries
to add the new IPv6-link-local GID, and when it identifies that the GID
already exists, the software GID cached does not add it. Thus a mismatch
occurs between the HW and the SW GID tables.

It resulted with sending traffic with the wrong source GID.

This commit fixes the issue by taking both from the HW table.

The problem can be reproduced with the following scenario:
Client:
    # ifconfig ens6 2.2.2.5
    # ifconfig ens6 inet6 add 2001:0db8:0:f101::5/64
    # ifconfig ens6 hw ether f4:52:14:61:a0:71
    # ifconfig ens6 inet6 del 2001:0db8:0:f101::5/64
    # ifconfig ens6 inet6 add 2001:0db8:0:f101::5/64
    # ucmatose -f ipv6 -b 2001:0db8:0:f101::5 -s 2001:0db8:0:f101::6 -p 20156
Server:
    # ucmatose -f ipv6 -b 2001:0db8:0:f101::6 -p 20156

Fixes: 4c3eb3ca13 ('IB/mlx4: Add VLAN support for IBoE')
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:44:42 -05:00
Eli Cohen
d8030b0de0 IB/mlx5: Fix blue flame buffer size calculation
A blue flame register is comprised of two buffers of equal size.

Fixes: 5fe9dec0d0 ("IB/mlx5: Use blue flame register allocator in mlx5_ib")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:44:42 -05:00
Leon Romanovsky
850b741514 IB/mlx4: Remove unused variable from function declaration
Remove unused netw_view parameter from eth_link_query_port() function.

Reported-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:44:42 -05:00
Or Gerlitz
c4550c63b3 IB: Query ports via the core instead of direct into the driver
Change the drivers to call ib_query_port in their get port
immutable handler instead of their own query port handler.

Doing this required to set the core cap flags of this device
before the ib_query_port call is made, since the IB core might
need these caps to serve the port query.

Drivers are ensured by the IB core that the port attributes passed
to the port query verb implementation are zero, and hence we
removed the zeroing from the drivers.

This patch doesn't add any new functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:22 -05:00
Or Gerlitz
ce1e055fb9 IB: Add protocol for USNIC
Add protocol definition for the proprietary the USNIC driver.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:21 -05:00
Or Gerlitz
bc63f9d558 IB/mlx4: Support raw packet protocol
Mark support for the new raw packet protocol on Eth ports.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:21 -05:00
Or Gerlitz
72cd57178f IB/mlx5: Support raw packet protocol
Mark support for the new raw packet protocol on Eth ports.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:20 -05:00
Artemy Kovalyov
81713d3788 IB/mlx5: Add implicit MR support
Add implicit MR, covering entire user address space.
The MR is implemented as an indirect KSM MR consisting of
1GB direct MRs.
Pages and direct MRs are added/removed to MR by ODP.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:19 -05:00
Artemy Kovalyov
49780d42df IB/mlx5: Expose MR cache for mlx5_ib
Allow other parts of mlx5_ib to use MR cache mechanism.
* Add new functions mlx5_mr_cache_alloc and mlx5_mr_cache_free
* Traditional MTT MKey buckets are limited by MAX_UMR_CACHE_ENTRY
  Additinal buckets may be added above.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:18 -05:00
Artemy Kovalyov
94990b4989 IB/mlx5: Add null_mkey access
Add mlx5_cmd_null_mkey() function to access null_mkey information
from firmware.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:18 -05:00
Artemy Kovalyov
d9d0674c0f IB/umem: Indicate that process is being terminated
When process is killed while pagefault operation still in progress -
function will fail. In this specific case we don't want any warnings in
dmesg to avoid log analyzers false alerts. So we need distinct error
code for this case.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:17 -05:00
Artemy Kovalyov
d07d1d70ce IB/umem: Update on demand page (ODP) support
Currently ODP MR may explicitly register virtual address space area
of limited length.
This change allows MR to cover entire process virtual address space
dynamicaly adding/removing translation entries to device MTT.

Add following changes to support implicit MR:
* Allow umem to be zero size to back-up implicit MR.
* Add new function ib_alloc_odp_umem() to add virtual memory regions
  to implicit MR dynamically on demand.
* Add new function rbt_ib_umem_lookup() to find dynamically added
  virtual memory regions.
* Expose function rbt_ib_umem_for_each_in_range() to other modules and
  make it safe

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:17 -05:00
Noa Osherovich
4be6da1e5b IB/mlx5: Support creation of a WQ with scatter FCS offload
Add support for creation of a WQ with scatter FCS capability, if
this capability is supported by the hardware.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:15 -05:00
Noa Osherovich
e4cc4fa7cc IB/mlx5: Enable QP creation with cvlan offload
Enable creating a RAW Ethernet QP with cvlan stripping offload when
it's supported by the hardware.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:15 -05:00
Noa Osherovich
b1f74a8437 IB/mlx5: Enable WQ creation and modification with cvlan offload
Allow creating a WQ with cvlan stripping considering device's
capabilities. The default value was fixed to disable vlan stripping
till was asked explicitly.

In addition, allow modification of a WQ to turn on/off this property.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:14 -05:00
Noa Osherovich
e816133440 IB/mlx5: Expose vlan offloads capabilities
Check device's capabilities and report which raw packet capabilities
are supported.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:14 -05:00
Noa Osherovich
9e1b161f3b IB/uverbs: Enable QP creation with cvlan offload
Enable user applications to create a QP with cvlan stripping offload.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:13 -05:00
Noa Osherovich
af1cb95d2e IB/uverbs: Enable WQ creation and modification with cvlan offload
Enable user space application via WQ creation and modification to
turn on and off cvlan offload.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:12 -05:00
Noa Osherovich
5f23d4265f IB/uverbs: Expose vlan offloads capabilities
Expose raw packet capabilities to user space as part of query device.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:12 -05:00
Majd Dibbiny
23a6964e3a IB/mlx5: Add port counter support for Receive WQs
Counters weren't updated due to Receive WQs' traffic since the
counter-id was not associated with the RQ.

Added support for associating the q-counter-id with the Receive WQ.
The attachment is done only when changing WQ's state from RESET to
READY in modify-WQ command.

FW support is required for the above, without this support
Receive WQ counters will not count.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:41:09 -05:00
Kamal Heib
7c16f47779 IB/mlx5: Expose Q counters groups only if they are supported by FW
This patch modify the Q counters implementation, so each one of the
three Q counters groups will be exposed by the driver only if they are
supported by the firmware.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 11:40:56 -05:00
Leon Romanovsky
1ffd3a26f8 IB/mlx5: Replace ENOTSUPP usage with EOPNOTSUPP
Flow steering is supposed to return EOPNOTSUPP error
for unsupported fields and not ENOTSUPP error.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:21:01 -05:00
Moses Reuben
2ac693f995 IB/mlx5: Add flow tag support
Set flow tag in flow table entry, when IB_FLOW_SPEC_ACTION_TAG
is part of the flow specifications.

Flow tag doesn't support multicast flows, so it's passing to
hardware only when used.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:21:01 -05:00
Moses Reuben
94e03f11ad IB/uverbs: Add support for flow tag
The struct ib_uverbs_flow_spec_action_tag associates a tag_id with the
flow defined by any number of other flow_spec entries which can reference
L2, L3, and L4 packet contents.

Use of ib_uverbs_flow_spec_action_tag allows the consumer to identify
the set of rules which where matched by
the packet by examining the tag_id in the CQE.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:21:01 -05:00
Leon Romanovsky
5abb0da9cd IB/mlx5: Remove deprecated module parameter
Commit 9603b61de1 ("mlx5: Move pci device handling from mlx5_ib
to mlx5_core") moved prof_sel module parameter from mlx5_ib to mlx5_core
and marked it as deprecated in 2014.

Three years after deprecation, it is time to remove the deprecated
module parameter.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Majd Dibbiny
ed88451e1f IB/mlx5: Assign DSCP for R-RoCE QPs Address Path
For Routable RoCE QPs, the DSCP should be set in the QP's
address path.

The DSCP's value is derived from the traffic class.

Fixes: 2811ba51b0 ("IB/mlx5: Add RoCE fields to Address Vector")
Cc: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Maor Gottlieb
1e0e50b617 IB/mlx5: Avoid SMP MADs from VFs
According to the device specification, we need to check that the
has_smi bit is set in vport context before allowing send SMP
MADs from VF.

Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Maor Gottlieb
c43f1112c0 IB/mlx5: Add additional checks before processing MADs
Check the has_smi bit in vport context and class version of MADs
before allowing MADs processing to take place.
MAD_IFC SMI commands can be executed only if smi bit is set.

Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Parvi Kaustubhi <parvik@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Kamal Heib
45bded2c21 IB/mlx5: Verify that Q counters are supported
Make sure that the Q counters are supported by the FW before trying
to allocate/deallocte them, this will avoid driver load failure when
they aren't supported by the FW.

Fixes: 0837e86a7a ('IB/mlx5: Add per port counters')
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Leon Romanovsky
12bbf1ea7e IB/mlx5: Return error for unsupported signature type
In case of unsupported singature, we returned positive
value, while the better approach is to return -EINVAL.

In addition, in this change, the error print is enriched
to provide an actual supplied signature type.

Fixes: e6631814fb ("IB/mlx5: Support IB_WR_REG_SIG_MR")
Cc: Sagi Grimberg <sagi@grimberg.me>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Leon Romanovsky
0fd27a88c2 IB/mlx5: Fix out-of-bound access
When we initialize buffer to create SRQ in kernel,
the number of pages was less than actually used in
following mlx5_fill_page_array().

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 10:14:25 -05:00
Selvin Xavier
592e8b3226 RDMA/bnxt_re: Add bnxt_re driver build support
Makefile and Kconfig changes for enabling bnxt_re compilation

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 09:51:28 -05:00
Selvin Xavier
1ac5a40479 RDMA/bnxt_re: Add bnxt_re RoCE driver
This patch introduces the RoCE driver for the Broadcom
NetXtreme-E 10/25/40/50G RoCE HCAs.

The RoCE driver is a two part driver that relies on the parent
bnxt_en NIC driver to operate.  The changes needed in the bnxt_en
driver have already been incorporated via Dave Miller's net tree
into the mainline kernel.

The vendor official git repository for this driver is available
on github as:
https://github.com/Broadcom/linux-rdma-nxt/

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14 09:51:15 -05:00
David S. Miller
35eeacf182 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00
Eyal Itkin
647bf3d8a8 IB/rxe: Fix mem_check_range integer overflow
Update the range check to avoid integer-overflow in edge case.
Resolves CVE 2016-8636.

Signed-off-by: Eyal Itkin <eyal.itkin@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-08 12:28:30 -05:00
Eyal Itkin
628f07d33c IB/rxe: Fix resid update
Update the response's resid field when larger than MTU, instead of only
updating the local resid variable.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Eyal Itkin <eyal.itkin@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-08 12:28:30 -05:00
David S. Miller
501ec18757 mlx5-updates-2017-01-31
This series includes some updates to mlx5 core and ethernet driver.
 
 We got one patch from Or to fix some static checker warnings.
 
 2nd patche from Dan came to add the support for 128B cache line
 in the HCA, which will configures the hardware to use 128B alignment only
 on systems with 128B cache lines, otherwise it will be kept as the current
 default of 64B.
 
 From me three patches to support no inline copy on TX on ConnectX-5 and
 later HCAs.  Starting with two small infrastructure changes and
 refactoring patches followed by two patches to add the actual support for
 both xmit ndo and XDP xmit routines.
 Last patch is a simple fix to return a mistakenly removed pointer from the
 SQ structure, which was remove in previous submission of mlx5 4K UAR.
 
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJYmQKaAAoJEEg/ir3gV/o+RBMH/RGHNw3yPB2MyWo28V3eabw+
 xl/SymiNOUgmq03ULYoc6xJpi9RCya7m/Kyce1M/M1gSz6LXubG2IDw9QsKV8lnc
 +5rwHCKjop6MdR3khsgqvWqGiKfQN0+QON5MjlPZB3/4u8qFcjauhfXpiX9naMO5
 aB/Sm9zRPwRnsEhy2AwPyZqOxe5boZzHqmZxpthIgPMtqbpBYNkTkooljsj/KqXf
 AO3y/mdGykELPF3lIHTE4X9zixx5s6MrlAYX2uGUrAojs2WVIBsq3iXI/J8X9zs/
 lg7to15WoMttR66vRZ120U6tx17OMmoxuAp+bmgZumabi/wDAZGSy5ELbH28WlY=
 =F+t/
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2017-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-01-31

This series includes some updates to mlx5 core and ethernet driver.

We got one patch from Or to fix some static checker warnings.

2nd patche from Dan came to add the support for 128B cache line
in the HCA, which will configures the hardware to use 128B alignment only
on systems with 128B cache lines, otherwise it will be kept as the current
default of 64B.

From me three patches to support no inline copy on TX on ConnectX-5 and
later HCAs.  Starting with two small infrastructure changes and
refactoring patches followed by two patches to add the actual support for
both xmit ndo and XDP xmit routines.
Last patch is a simple fix to return a mistakenly removed pointer from the
SQ structure, which was remove in previous submission of mlx5 4K UAR.

Saeed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 13:44:08 -05:00
Christoph Hellwig
b6a05c823f scsi: remove eh_timed_out methods in the transport template
Instead define the timeout behavior purely based on the host_template
eh_timed_out method and wire up the existing transport implementations
in the host templates.  This also clears up the confusion that the
transport template method overrides the host template one, so some
drivers have to re-override the transport template one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-06 19:10:03 -05:00
Parav Pandit
d0d7b10b05 net-next: treewide use is_vlan_dev() helper function.
This patch makes use of is_vlan_dev() function instead of flag
comparison which is exactly done by is_vlan_dev() helper function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jon Maxwell <jmaxwell37@gmail.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 16:33:29 -05:00
Saeed Mahameed
2b31f7ae5f net/mlx5: TX WQE update
Add new TX WQE fields for Connect-X5 vlan insertion support,
type and vlan_tci, when type = MLX5_ETH_WQE_INSERT_VLAN the
HW will insert the vlan and prio fields (vlan_tci) to the packet.

Those bits and the inline header fields are mutually exclusive, and
valid only when:
MLX5_CAP_ETH(mdev, wqe_inline_mode) == MLX5_CAP_INLINE_MODE_NOT_REQUIRED
and MLX5_CAP_ETH(mdev, wqe_vlan_insert),
who will be set in ConnectX-5 and later HW generations.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-02-06 18:20:16 +02:00
David S. Miller
4e8f2fc1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two trivial overlapping changes conflicts in MPLS and mlx5.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28 10:33:06 -05:00
Yuval Shaia
24dc831b77 IB/core: Add inline function to validate port
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-27 14:33:59 -05:00
Bart Van Assche
2bce1a6d22 IB/srpt: Accept GUIDs as port names
Port and ACL information must be configured before an initiator
logs in.  Make it possible to configure this information before
a subnet prefix has been assigned to a port by not only accepting
GIDs as target port and initiator port names but by also accepting
port GUIDs.

Add a 'priv' member to struct se_wwn to allow target drivers to
associate their own data with struct se_wwn.

Reported-by: Doug Ledford <dledford@redhat.com>
References: http://www.spinics.net/lists/linux-rdma/msg39505.html
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-27 14:31:50 -05:00
Christophe Jaillet
a3dd3a48a5 IB/cma: Fix reversed test
This test looks reverted.
We should log an error message only if 'ib_attach_mcast()' fails.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-27 14:29:20 -05:00
Jack Morgenstein
b4cfe3971f RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled
If IPV6 has not been enabled in the underlying kernel, we must avoid
calling IPV6 procedures in rdma_cm.ko.

This requires using "IS_ENABLED(CONFIG_IPV6)" in "if" statements
surrounding any code which calls external IPV6 procedures.

In the instance fixed here, procedure cma_bind_addr() called
ipv6_addr_type() -- which resulted in calling external procedure
__ipv6_addr_type().

Fixes: 6c26a77124 ("RDMA/cma: fix IPv6 address resolution")
Cc: <stable@vger.kernel.org> # v4.2+
Cc: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-27 14:29:04 -05:00
Zhu Yanjun
5c37077fd0 IB/ipoib: Remove the unnecessary error check
The function ipoib_mcast_start_thread/ipoib_ib_dev_up always return zero.
As such, in the function ipoib_open, err_stop will never be reached.
So remove this err_stop and change the return type of the function
ipoib_mcast_start_thread/ipoib_ib_dev_up to void.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:22:24 -05:00
Shiraz Saleem
3f9fade5e7 i40iw: Set maj_err and min_err in i40iw_sc_cqp_create
Set maj_err and min_err in i40iw_sc_cqp_create so that it
returns correct values for all return cases. This also
addresses an uninitialized variable warning for maj_err and
min_err in i40iw_create_cqp.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:20:37 -05:00
Leon Romanovsky
564649b4ea IB/qib: Remove empty function
Commit f06267104d ("RDMA: Update workqueue usage") removed
content of qib_qsfp_deinit(...) and left it empty.

This patch deletes all leftovers of that function.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:20:37 -05:00
Jack Wang
21d6454a39 RDMA/core: create struct ib_port_cache
As Jason suggested, we have 4 elements for per port arrays,
it's better to have a separate structure to represent them.

It simplifies code a bit, ~ 30 lines of code less :)

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Michael Wang <yun.wang@profitbricks.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:20:37 -05:00
Zhu Yanjun
dfc0e55506 IB/ipoib: function interface change
The ipoib_ib_dev_down/ipoib_ib_dev_stop return zero unconditionally
and the callers never check the returned values,
change the return type to void and remove the redundant return values.

Reviewed-by: Shan Hai <shan.hai@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:20:37 -05:00
Dan Carpenter
820cd30ac2 i40iw: fix some indenting in i40iw_sc_vsi_init()
The debug printk was indented more than it should have been and we
can remove an unnecessary line break.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:20:37 -05:00
Moni Shoua
19b752a19d IB/cma: Allow port reuse for rdma_id
When allocating a port number for binding to a rdma_id, assuming the
allocation is not for a specific port, the rule is to allow only ports
that were not in use before by any other rdma_id.

This condition is too strong to achieve the goal of a unique 5 tuple
rdma_id. Instead, we can compare current rdma_id with other rdma_id for
difference in one of destination port, source address and destination
address to allow port reuse.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:20:37 -05:00
Moni Shoua
498683c6a7 IB/cma: Add debug messages to error flows
Print debug messages to the kernel log to add more
information about RDMA_CM events that indicate an error.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:20:37 -05:00
Zhu Yanjun
f7534f45dc IB/ipoib: Remove unnecessary returned value check
In the function ipoib_set_dev_features, the returned value is always 0.
As such, it is not necessary to check the returned value.
This is not a bug. It is a trivial problem.

Reviewed-by: Guanglei Li <guanglei.li@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:20:37 -05:00
Colin Ian King
506f71d181 IB/isert: fix spelling mistake: "teminating" -> "terminating"
Trivial fix to spelling mistake in isert_warn message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:20:37 -05:00
Yonatan Cohen
2d4b21e0a2 IB/rxe: Prevent from completer to operate on non valid QP
On UD QP completer tasklet is scheduled for each packet sent.

If it is followed by a destroy_qp(), the kernel panic will
happen as the completer tries to operate on a destroyed QP.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:17:32 -05:00
Maor Gottlieb
f39f775218 IB/rxe: Fix rxe dev insertion to rxe_dev_list
The first argument of list_add_tail is the new item and the second
is the head of the list. Fix the code to pass arguments in the
right order, otherwise not all the rxe devices will be removed
during teardown.

Fixes: 8700e3e7c4 ('Soft RoCE driver')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 16:17:25 -05:00
Kenneth Lee
828f6fa65c IB/umem: Release pid in error and ODP flow
1. Release pid before enter odp flow
2. Release pid when fail to allocate memory

Fixes: 87773dd56d ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
Fixes: 8ada2c1c0c ("IB/core: Add support for on demand paging regions")
Signed-off-by: Kenneth Lee <liguozhu@hisilicon.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:44:31 -05:00
Ram Amrani
f449c7a2d8 RDMA/qedr: Dispatch port active event from qedr_add
Relying on qede to trigger qedr on startup is problematic. When probing
both if qedr loads slowly then qede can assume qedr is missing and not
trigger it. This patch adds a triggering from qedr and protects against
a race via an atomic bit.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:35:08 -05:00
Ram Amrani
9c1e0228ab RDMA/qedr: Fix and simplify memory leak in PD alloc
Free the PD if no internal resources were available. Move userspace
code under the relevant 'if'.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:35:07 -05:00
Ram Amrani
af2b14b8b8 RDMA/qedr: Fix RDMA CM loopback
The loopback logic in RDMA CM packets compares Ethernet addresses and
was accidently inverse.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:35:02 -05:00
Ram Amrani
1a59075197 RDMA/qedr: Fix formatting
Remove standalone ';'.  List function's parameters in a single line.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:35:01 -05:00
Ram Amrani
27a4b1a6d6 RDMA/qedr: Mark three functions as static
mark qedr_get_state_from_ibqp(), __qedr_alloc_mr() and __qedr_post_send()
as static since they are only used in the same file.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:34:56 -05:00
Ram Amrani
933e6dcaa0 RDMA/qedr: Don't reset QP when queues aren't flushed
Fail QP state transition from error to reset if SQ/RQ are not empty
and still in the process of flushing out the queued work entries.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:34:55 -05:00
Ram Amrani
c78c314961 RDMA/qedr: Don't spam dmesg if QP is in error state
It is normal to flush CQEs if the QP is in error state. Hence there's no
use in printing a message per CQE to dmesg.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:34:54 -05:00
Ram Amrani
91bff997db RDMA/qedr: Remove CQ spinlock from CM completion handlers
There is only a single event queue that triggers the completion
events for the RDMA CM and it is being processed serially. This means
that inherently there can no parallelism of CQ completion handler
callbacks, hence the lock is redundant.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:34:43 -05:00
Ram Amrani
59e8970b37 RDMA/qedr: Return max inline data in QP query result
Return the maximum supported amount of inline data, not the qp's current
configured inline data size, when filling out the results of a query
qp call.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:34:37 -05:00
Ram Amrani
865cea40b6 RDMA/qedr: Return success when not changing QP state
If the user is requesting us to change the QP state to the same state
that it is already in, return success instead of failure.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:34:36 -05:00
Amrani, Ram
097b615965 RDMA/qedr: Fix MTU returned from QP query
MTU value returned from QP query should include overhead.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:34:30 -05:00
Amrani, Ram
d3f4aadd61 RDMA/core: Add the function ib_mtu_int_to_enum
As the functionality to convert the MTU from a number to enum_ib_mtu
is ubiquitous, define a dedicated function and remove the duplicated
code.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 15:34:22 -05:00
Ganesh Goudar
bab572f1d4 iw_cxgb4: Guard against null cm_id in dump_ep/qp
Endpoints that are aborting can have already dereferenced the
cm_id and set ep->com.cm_id to NULL.  So guard against that in
dump_ep() and dump_qp().

Also create a common function for setting up ip address pointers
since the same logic is needed in several places.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 14:44:01 -05:00
Yuval Shaia
f57e8ca50e IB/mad: Add port_num to error message
Print the invalid port number to ease troubleshooting.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 14:20:42 -05:00
Yuval Shaia
1dd70ea360 IB/vmw_pvrdma: Remove unused qp_type
Remove the unused qp_type parameter from function's args

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 14:20:42 -05:00
Yuval Shaia
6c6e51a617 IB/core: Fix typo in comment
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 14:19:48 -05:00
Adit Ranadive
ff89b070b7 IB/vmw_pvrdma: Fix incorrect cleanup on pvrdma_pci_probe error path
If the interrupt allocation failed we should start freeing the CQ rings
rather than unregistering the netdev notifier.

Fixes: 29c8d9eba5 ("IB: Add vmw_pvrdma driver")
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 14:15:28 -05:00
Adit Ranadive
7d211c81e9 IB/vmw_pvrdma: Don't leak info from alloc_ucontext
Clear out the user response struct correctly.

Fixes: 29c8d9eba5 ("IB: Add vmw_pvrdma driver")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 14:15:28 -05:00
Or Gerlitz
7898489880 IB/mlx5: Enable Eth VFs to query their min-inline value for user-space
For some mlx5 HW models (CX4, CX4Lx), the VF driver needs to put part
of the packet headers on the TX descriptor so the e-switch can do proper
matching and steering. This is called "min-inline", it's advertized to
the VF by the FW and also enforced on them by the HW, such that if they
don't obey, their packets are dropped.

SRIOV VF libmlx5 instances should take into account the min-inline
value of their vports. For that end, we provide this value through
the vendor response part of init_ucontext command.

The min inline value is reported in a way which will let newer libmlx5
instances realize that they are running over an older kernel and act
accordingly (e.g apply some educated guess).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24 21:14:06 +02:00
Bart Van Assche
0bbb3b7496 IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
Make the rxe and rdmavt drivers use dma_virt_ops. Update the
comments that refer to the source files removed by this patch.
Remove struct ib_dma_mapping_ops. Remove ib_device.dma_ops.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Andrew Boyer <andrew.boyer@dell.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Jonathan Toppins <jtoppins@redhat.com>
Cc: Alex Estrin <alex.estrin@intel.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:31:32 -05:00
Bart Van Assche
99db949403 IB/core: Remove ib_device.dma_device
Add code in ib_register_device() for copying the DMA masks. Use
&ib_device.dev in DMA mapping operations instead of dma_device.
Remove ib_device.dma_device because due to this and previous patches
it is no longer used.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:26:17 -05:00
Bart Van Assche
e3dfa60c0a IB/srpt: Modify a debug statement
Since a later patch will remove ib_device.dma_device and since knowing
the value of that pointer is not too important, remove dma_device from
the debug output.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:26:17 -05:00
Bart Van Assche
dee2b82a5f IB/srp: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:26:17 -05:00
Bart Van Assche
61118cecf2 IB/iser: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:26:17 -05:00
Bart Van Assche
db97ed0a2e IB/IPoIB: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:26:17 -05:00
Bart Van Assche
85e9f1dbbd IB/rxe: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:26:17 -05:00
Bart Van Assche
a62ef9a7d2 IB/vmw_pvrdma: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Adit Ranadive <aditr@vmware.com>
Cc: VMware PV-Drivers <pv-drivers@vmware.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
6b06d52dbe IB/usnic: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
989ab358f7 IB/qib: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
69117101f9 IB/qedr: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Ram Amrani <Ram.Amrani@cavium.com>
Cc: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
e6a73f2672 IB/ocrdma: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Selvin Xavier <selvin.xavier@avagotech.com>
Cc: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
a487a0bff3 IB/nes: Remove a superfluous assignment statement
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
26e372705f IB/mthca: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
9b0c289ec4 IB/mlx5: Switch from dma_device to dev.parent
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Matan Barak <matanb@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
d66c88a8fc IB/mlx4: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
f2296adccf IB/i40iw: Remove a superfluous assignment statement
Due to a previous patch initializing ib_device.dev.parent is
sufficient and initializing dma_device is no longer needed.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
fecd02eb2c IB/hns: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Lijun Ou <oulijun@huawei.com>
Cc: Wei Hu(Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
3067771c51 IB/hfi1: Switch from dma_device to dev.parent
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
d08868a15a IB/cxgb4: Set dev.parent instead of dma_device
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
91f734b4f3 IB/cxgb3: Set dev.parent instead of dma_device
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Steve Wise <swise@chelsio.com>
Acked-by: Steve Wise <swise@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
1e35a0880f IB/core: Use dev.parent instead of dma_device
Prepare for removal of ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
97a9ea8480 IB/core: Initialize ib_device.dev.parent earlier
Move the ib_device.dev.parent initialization code from
ib_device_register_sysfs() to ib_register_device(). Additionally,
allow HBA drivers to set ib_device.dev.parent without setting
ib_device.dma_device. This is the first step towards removing
ib_device.dma_device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
5f0cb80134 IB/qib: Remove DMA mapping code
The qib DMA mapping code is no longer built since commit eb636ac0e4
("IB/qib: Remove dma.c and use rdmavt version of dma functions"). Hence
remove it.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
e6d356d3cd IB/hf1: Remove DMA mapping code
The hfi1 DMA mapping code has never been built in any upstream kernel.
Hence remove it.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Bart Van Assche
5657933dbb treewide: Move dma_ops from struct dev_archdata into struct device
Some but not all architectures provide set_dma_ops(). Move dma_ops
from struct dev_archdata into struct device such that it becomes
possible on all architectures to configure dma_ops per device.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Russell King <linux@armlinux.org.uk>
Cc: x86@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 12:23:35 -05:00
Max Gurtovoy
83236f0157 IB/iser: remove unused variable from iser_conn struct
max_sectors calculation was fixed in commit:
9c674815d3 ("IB/iser: Fix max_sectors calculation").
Thus, iser_conn variable scsi_max_sectors is not needed anymore.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 11:37:45 -05:00
Max Gurtovoy
1e5db6c31a IB/iser: Fix sg_tablesize calculation
For devices that can register page list that is bigger than
USHRT_MAX, we actually take the wrong value for sg_tablesize.
E.g: for CX4 max_fast_reg_page_list_len is 65536 (bigger than USHRT_MAX)
so we set sg_tablesize to 0 by mistake. Therefore, each IO that is
bigger than 4k splitted to "< 4k" chunks that cause performance degredation.
Remove wrong sg_tablesize assignment, and use the value that was set during
address resolution handler with the needed casting.

Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 11:37:45 -05:00
Israel Rukshin
0a475ef422 IB/srp: fix invalid indirect_sg_entries parameter value
After setting indirect_sg_entries module_param to huge value (e.g 500,000),
srp_alloc_req_data() fails to allocate indirect descriptors for the request
ring (kmalloc fails). This commit enforces the maximum value of
indirect_sg_entries to be SG_MAX_SEGMENTS as signified in module param
description.

Fixes: 65e8617fba (scsi: rename SCSI_MAX_{SG, SG_CHAIN}_SEGMENTS)
Fixes: c07d424d61 (IB/srp: add support for indirect tables that don't fit in SRP_CMD)
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 11:30:14 -05:00
Israel Rukshin
ad8e66b4a8 IB/srp: fix mr allocation when the device supports sg gaps
If the device support arbitrary sg list mapping (device cap
IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with
IB_MR_TYPE_SG_GAPS.

Fixes: 509c5f33f4 ("IB/srp: Prevent mapping failures")
Cc: <stable@vger.kernel.org> # 4.7+
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24 11:03:17 -05:00
Mohamad Haj Yahia
105433659d net/mlx5: Add support to s-tag in mlx5 firmware interface
Add svlan_tag and rename vlan_tag to cvlan_tag in flow table entry
match param.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-01-19 23:19:55 +02:00
Peter Zijlstra
2c935bc572 locking/atomic, kref: Add kref_read()
Since we need to change the implementation, stop exposing internals.

Provide kref_read() to read the current reference count; typically
used for debug messages.

Kills two anti-patterns:

	atomic_read(&kref->refcount)
	kref->refcount.counter

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 11:37:18 +01:00
Jack Wang
102c5ce082 RDMA/cma: use cached port state when bind loopback
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Michael Wang <yun.wang@profitbricks.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 23:00:04 -05:00
Jack Wang
93b1f29de7 RDMA/cma: resolve to first active ib port
When we try to resolve a dest addr, if we don't give src addr,
cma core will try to resolve to our source ib device automatically.
The current logic only checks if a given port has the same
subnet_prefix as our dest, which is not enough if we use default
well known subnet_prefix on our active port, as it will be the same
as the subnet_prefix on inactive ports and we might match against
an inactive port by accident.  To resolve this, we should also check
if port is active before we resolve it as a suitable src address for
a given dest.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Michael Wang <yun.wang@profitbricks.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 23:00:04 -05:00
Jack Wang
9e2c3f1c7f RDMA/core: export ib_get_cached_port_state
Export function for rdma_cm, patch for rdma_cm to follow.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Michael Wang <yun.wang@profitbricks.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 23:00:00 -05:00
Jack Wang
aaaca121c7 RDMA/core: add port state cache
We need a port state cache in ib_core, later we will use in rdma_cm.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Michael Wang <yun.wang@profitbricks.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 22:59:55 -05:00
Feras Daoud
27d41d29c7 IB/ipoib: Change list_del to list_del_init in the tx object
Since ipoib_cm_tx_start function and ipoib_cm_tx_reap function
belong to different work queues, they can run in parallel.
In this case if ipoib_cm_tx_reap calls list_del and release the
lock, ipoib_cm_tx_start may acquire it and call list_del_init
on the already deleted object.
Changing list_del to list_del_init in ipoib_cm_tx_reap fixes the problem.

Fixes: 839fcaba35 ("IPoIB: Connected mode experimental support")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:06 -05:00
Feras Daoud
c586071d1d IB/ipoib: Replace list_del of the neigh->list with list_del_init
In order to resolve a situation where a few process delete
the same list element in sequence and cause panic, list_del
is replaced with list_del_init. In this case if the first
process that calls list_del releases the lock before acquiring
it again, other processes who can acquire the lock will call
list_del_init.

Fixes: b63b70d877 ("IPoIB: Use a private hash table for path lookup")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:05 -05:00
Feras Daoud
13ee429a02 IB/ipoib: Use debug prints instead of warnings in RNR WC status
If a receive request has not been posted to the work queue, the incoming
message is rejected and the peer will receive a receiver-not-ready (RNR)
error. In IPoIB, IB_WC_RNR_RETRY_EXC_ERR error is part of the life cycle
therefore ipoib_cm_handle_tx_wc function will print to debug instead
of warnings.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:05 -05:00
Feras Daoud
d32b9a81d7 IB/ipoib: Add detailed error message to dev_queue_xmit call
Add a detailed return code to dev_queue_xmit function when
calling to requeue packet via __skb_dequeue.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:04 -05:00
Feras Daoud
89a3987ab7 IB/ipoib: rtnl_unlock can not come after free_netdev
The ipoib_vlan_add function calls rtnl_unlock after free_netdev,
rtnl_unlock not only releases the lock, but also calls netdev_run_todo.
The latter function browses the net_todo_list array and completes the
unregistration of all its net_device instances. If we call free_netdev
before rtnl_unlock, then netdev_run_todo call over the freed device causes
panic.
To fix, move rtnl_unlock call before free_netdev call.

Fixes: 9baa0b0364 ("IB/ipoib: Add rtnl_link_ops support")
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:04 -05:00
Feras Daoud
0a0007f283 IB/ipoib: Fix deadlock between rmmod and set_mode
When calling set_mode from sys/fs, the call flow locks the sys/fs lock
first and then tries to lock rtnl_lock (when calling ipoib_set_mod).
On the other hand, the rmmod call flow takes the rtnl_lock first
(when calling unregister_netdev) and then tries to take the sys/fs
lock. Deadlock a->b, b->a.

The problem starts when ipoib_set_mod frees it's rtnl_lck and tries
to get it after that.

    set_mod:
    [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90
    [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180
    [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20
    [<ffffffff814fed2b>] mutex_lock+0x2b/0x50
    [<ffffffff81448675>] rtnl_lock+0x15/0x20
    [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib]
    [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib]
    [<ffffffff8134b840>] dev_attr_store+0x20/0x30
    [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170
    [<ffffffff8117b068>] vfs_write+0xb8/0x1a0
    [<ffffffff8117ba81>] sys_write+0x51/0x90
    [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

    rmmod:
    [<ffffffff81279ffc>] ? put_dec+0x10c/0x110
    [<ffffffff8127a2ee>] ? number+0x2ee/0x320
    [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0
    [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0
    [<ffffffff8127b550>] ? string+0x40/0x100
    [<ffffffff814fe323>] wait_for_common+0x123/0x180
    [<ffffffff81060250>] ? default_wake_function+0x0/0x20
    [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0
    [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20
    [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270
    [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0
    [<ffffffff81273f66>] kobject_del+0x16/0x40
    [<ffffffff8134cd14>] device_del+0x184/0x1e0
    [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0
    [<ffffffff8143c05e>] rollback_registered+0xae/0x130
    [<ffffffff8143c102>] unregister_netdevice+0x22/0x70
    [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30
    [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib]
    [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core]
    [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib]
    [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core]

Fixes: 862096a8bb ("IB/ipoib: Add more rtnl_link_ops callbacks")
Cc: <stable@vger.kernel.org> # v3.6+
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:03 -05:00
Feras Daoud
1c3098cdb0 IB/ipoib: Fix deadlock over vlan_mutex
This patch fixes Deadlock while executing ipoib_vlan_delete.

The function takes the vlan_rwsem semaphore and calls
unregister_netdevice. The later function calls
ipoib_mcast_stop_thread that cause workqueue flush.

When the queue has one of the ipoib_ib_dev_flush_xxx events,
a deadlock occur because these events also tries to catch the
same vlan_rwsem semaphore.

To fix, unregister_netdevice should be called after releasing
the semaphore.

Fixes: cbbe1efa49 ("IPoIB: Fix deadlock between ipoib_open() and child interface create")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:02 -05:00
Feras Daoud
80b5b35aba IB/ipoib: Set device connection mode only when needed
When changing the connection mode, the ipoib_set_mode function
did not check if the previous connection mode equals to the
new one. This commit adds the required check and return 0 if the new
mode equals to the previous one.

Fixes: 839fcaba35 ("IPoIB: Connected mode experimental support")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 14:01:02 -05:00
Feras Daoud
29da686dff IB/ipoib: When given an invalid UD MTU, give debug msg
In datagram mode, the IB UD (Unreliable Datagram) transport is used
so the MTU of the interface is equal to the IB L2 MTU minus the
IPoIB encapsulation header. Any request to change the MTU value
above the maximum range will change the MTU to the max allowed, but
will not show any warning message. An ipoib_warn is issued in such
cases, letting the user know that even though the value is legal,
it can't be currently applied.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 13:59:56 -05:00
ssh10
db287ec5cb RDMA/ocrdma: Replace BUG() with BUG_ON()
Replace BUG() with BUG_ON() using coccinelle

Signed-off-by: Shyam Saini <mayhs11saini@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 12:21:52 -05:00
ssh10
b462b06eb6 RDMA/cxgb4: Use AF_INET for sin_family field
Elsewhere the sin_family field holds a value with a name of the form
AF_..., so it seems reasonable to do so here as well.  Also the values
of PF_INET and AF_INET are the same.

The semantic patch that makes this change is as follows:

//</smpl>
@@
struct sockaddr_in sip;
@@

(
sip.sin_family ==
- PF_INET
+ AF_INET
|
sip.sin_family !=
- PF_INET
+ AF_INET
|
sip.sin_family =
- PF_INET
+ AF_INET
)
//</smpl>

Signed-off-by: Shyam Saini <mayhs11saini@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 12:21:52 -05:00
Amrani, Ram
df15856132 RDMA/qedr: restructure functions that create/destroy QPs
Simplify function and sub-function flow of QP creation and destruction.
This also serves as a preparation for SRQ and iWARP support.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Reviewed-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 12:21:41 -05:00
Geliang Tang
bb75f33cf0 RDMA/qib: use rb_entry()
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 11:38:41 -05:00
Cao jin
e8f4eb3bfa RDMA/hfi1: drop pci_link_reset()
In AER recovery, pci_error_handlers.link_reset() is never called,
drop it now.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 11:38:41 -05:00
Cao jin
850d08721a RDMA/qib: drop qib_pci_link_reset()
In AER recovery, pci_error_handlers.link_reset() is never called,
drop it now.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 11:38:41 -05:00
Kees Cook
7f6856b789 RDMA/i40iw: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 11:38:41 -05:00
Kees Cook
6554c9f7f7 RDMA/nes: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-12 11:38:41 -05:00
Bart Van Assche
c5540a0195 IB/rxe: Fix an skb leak
Additionally, make it easier to detect skb leaks by issuing a warning
if a leak occurs.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Cc: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
839f5ac0d8 IB/rxe: Remove a pointless indirection layer
Neither rxe->ifc_ops nor any of the function pointers in struct
struct rxe_ifc_ops ever change. Hence remove the rxe->ifc_ops
indirection mechanism.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
ab17654476 IB/rxe: Fix reference leaks in memory key invalidation code
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
b3a4599610 IB/rxe: Fix a MR reference leak in check_rkey()
Avoid that calling check_rkey() for mem->state == RXE_MEM_STATE_FREE
triggers an MR reference leak.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
18d3451c0d IB/rxe: Generate a completion for all failed work requests
Change do_complete() such that an error completion is not only
generated if a QP is in the error state but also if a work request
failed.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
723ec9ae2a IB/rxe: Introduce functions for queue draining
This change makes the code easier to read and avoids that code is
duplicated.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
642c7cbcaf IB/rxe: Add a runtime check in alloc_index()
Since index values equal to or above 'range' can trigger memory
corruption, complain if index >= range.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
43553b47c3 IB/rxe: Issue warnings once
It is strongly recommended to report kernel warnings once instead
of every time a condition is hit. Hence change WARN_ON() into
WARN_ON_ONCE() / BUILD_BUG_ON() as appropriate.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
32404fb764 IB/rxe: Let the compiler check the type of the cleanup functions
Change the argument type of these functions from void * into
struct rxe_pool_entry *.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
046ef24d25 IB/rxe: Enable type checking on SKB_TO_PKT() and PKT_TO_SKB() arguments
Let the compiler check the type of the arguments passed to SKB_TO_PKT()
and PKT_TO_SKB().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
967335ab90 IB/rxe: Remove superfluous casts
Casting a pointer to 'void *' explicitly is not necessary in C code.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
175f1244c1 IB/rxe: Remove an unused variable and an unused argument
The variable 'av' is not used so remove it. Since that change
removes the last user of the 'wqe' argument, remove that argument
too.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
c8b82182cb IB/rxe: Remove an unused function
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
2bec3baded IB/rxe: Constify the pool name
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Bart Van Assche
8d8f083720 IB/rxe: Suppress sparse warnings
Avoid that sparse complains about using 0 as a pointer, about
missing function declarations and also avoid that sparse complains
about endianness.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 16:52:47 -05:00
Selvin Xavier
69ae543969 RDMA: Adding ethertype ETH_P_IBOE
Update the if_ether.h with the  ethertype for Infiniband over
Ethernet packets. Also, removing the occurances of 0x8915
from infiniband vendor drivers.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 14:05:11 -05:00
Steve Wise
3bcf96e018 iw_cxgb4: do not send RX_DATA_ACK CPLs after close/abort
Function rx_data(), which handles ingress CPL_RX_DATA messages, was
always sending an RX_DATA_ACK with the goal of updating the credits.
However, if the RDMA connection is moved out of FPDU mode abruptly,
then it is possible for iw_cxgb4 to process queued RX_DATA CPLs after HW
has aborted the connection.  These CPLs should not trigger RX_DATA_ACKS.
If they do, HW can see a READ after DELETE of the DB_LE hash entry for
the tid and post a LE_DB HashTblMemCrcError.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 14:01:38 -05:00
Steve Wise
c12a67fec8 iw_cxgb4: free EQ queue memory on last deref
Commit ad61a4c7a9 ("iw_cxgb4: don't block in destroy_qp awaiting
the last deref") introduced a bug where the RDMA QP EQ queue memory
(and QIDs) are possibly freed before the underlying connection has been
fully shutdown.  The result being a possible DMA read issued by HW after
the queue memory has been unmapped and freed.  This results in possible
WR corruption in the worst case, system bus errors if an IOMMU is in use,
and SGE "bad WR" errors reported in the very least.  The fix is to defer
unmap/free of queue memory and QID resources until the QP struct has
been fully dereferenced.  To do this, the c4iw_ucontext must also be kept
around until the last QP that references it is fully freed.  In addition,
since the last QP deref can happen in an IRQ disabled context, we need
a new workqueue thread to do the final unmap/free of the EQ queue memory.

Fixes: ad61a4c7a9 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 14:01:38 -05:00
Steve Wise
4fe7c2962e iw_cxgb4: refactor sq/rq drain logic
With the addition of the IB/Core drain API, iw_cxgb4 supported drain
by watching the CQs when the QP was out of RTS and signalling "drain
complete" when the last CQE is polled.  This, however, doesn't fully
support the drain semantics. Namely, the drain logic is supposed to signal
"drain complete" only when the application has _processed_ the last CQE,
not just removed them from the CQ.  Thus a small timing hole exists that
can cause touch after free type bugs in applications using the drain API
(nvmf, iSER, for example).  So iw_cxgb4 needs a better solution.

The iWARP Verbs spec mandates that "_at some point_ after the QP is
moved to ERROR", the iWARP driver MUST synchronously fail post_send and
post_recv calls.  iw_cxgb4 was currently not allowing any posts once the
QP is in ERROR.  This was in part due to the fact that the HW queues for
the QP in ERROR state are disabled at this point, so there wasn't much
else to do but fail the post operation synchronously.  This restriction
is what drove the first drain implementation in iw_cxgb4 that has the
above mentioned flaw.

This patch changes iw_cxgb4 to allow post_send and post_recv WRs after
the QP is moved to ERROR state for kernel mode users, thus still adhering
to the Verbs spec for user mode users, but allowing flush WRs for kernel
users.  Since the HW queues are disabled, we just synthesize a CQE for
this post, queue it to the SW CQ, and then call the CQ event handler.
This enables proper drain operations for the various storage applications.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10 14:01:38 -05:00
Parav Pandit
43579b5f2c IB/core: added support to use rdma cgroup controller
Added support APIs for IB core to register/unregister every IB/RDMA
device with rdma cgroup for tracking rdma resources.
IB core registers with rdma cgroup controller.
Added support APIs for uverbs layer to make use of rdma controller.
Added uverbs layer to perform resource charge/uncharge functionality.
Added support during query_device uverb operation to ensure it
returns resource limits by honoring rdma cgroup configured limits.

Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-01-10 11:14:27 -05:00
David S. Miller
bda65b4255 mlx5 4K UAR
The following series of patches optimizes the usage of the UAR area which is
 contained within the BAR 0-1. Previous versions of the firmware and the driver
 assumed each system page contains a single UAR. This patch set will query the
 firmware for a new capability that if published, means that the firmware can
 support UARs of fixed 4K regardless of system page size. In the case of
 powerpc, where page size equals 64KB, this means we can utilize 16 UARs per
 system page. Since user space processes by default consume eight UARs per
 context this means that with this change a process will need a single system
 page to fulfill that requirement and in fact make use of more UARs which is
 better in terms of performance.
 
 In addition to optimizing user-space processes, we introduce an allocator
 that can be used by kernel consumers to allocate blue flame registers
 (which are areas within a UAR that are used to write doorbells). This provides
 further optimization on using the UAR area since the Ethernet driver makes
 use of a single blue flame register per system page and now it will use two
 blue flame registers per 4K.
 
 The series also makes changes to naming conventions and now the terms used in
 the driver code match the terms used in the PRM (programmers reference manual).
 Thus, what used to be called UUAR (micro UAR) is now called BFREG (blue flame
 register).
 
 In order to support compatibility between different versions of
 library/driver/firmware, the library has now means to notify the kernel driver
 that it supports the new scheme and the kernel can notify the library if it
 supports this extension. So mixed versions of libraries can run concurrently
 without any issues.
 
 Thanks,
         Eli and Matan
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJYc9kSAAoJEEg/ir3gV/o+a0EH/jEGiopH7CHc4T4nXT1I4kQa
 TicrkMNV3Sr9MBWwn8TLOyx+Fi1dex4cumrJI/BNVjC6h/nS6JHbslYoZxTkX9lT
 L0vRsHJBVr/PODqimIGNnlJFBPhNJSGiHG4JHlJHlpvcGNahitN3gXmUjcRNju+V
 ExnvgwWzAXM0qg1qWf5A/3HmqbtYES1rJXQUsimtc2QAif/SIayBD4fEA8x5zNBA
 i0p8xcDrzUqmeblkpnsJA3w40s1rsuqvJnvLPDpbpKENtHfw1UFZ2987P7LvOrIv
 NF/mZBkStC0gOZX6dLEAdoZXL1gTsJX19hTkUMfYH4BHqHARa2/oCS3wcCf1Giw=
 =C+cp
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-4kuar-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5 4K UAR

The following series of patches optimizes the usage of the UAR area which is
contained within the BAR 0-1. Previous versions of the firmware and the driver
assumed each system page contains a single UAR. This patch set will query the
firmware for a new capability that if published, means that the firmware can
support UARs of fixed 4K regardless of system page size. In the case of
powerpc, where page size equals 64KB, this means we can utilize 16 UARs per
system page. Since user space processes by default consume eight UARs per
context this means that with this change a process will need a single system
page to fulfill that requirement and in fact make use of more UARs which is
better in terms of performance.

In addition to optimizing user-space processes, we introduce an allocator
that can be used by kernel consumers to allocate blue flame registers
(which are areas within a UAR that are used to write doorbells). This provides
further optimization on using the UAR area since the Ethernet driver makes
use of a single blue flame register per system page and now it will use two
blue flame registers per 4K.

The series also makes changes to naming conventions and now the terms used in
the driver code match the terms used in the PRM (programmers reference manual).
Thus, what used to be called UUAR (micro UAR) is now called BFREG (blue flame
register).

In order to support compatibility between different versions of
library/driver/firmware, the library has now means to notify the kernel driver
that it supports the new scheme and the kernel can notify the library if it
supports this extension. So mixed versions of libraries can run concurrently
without any issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 17:09:31 -05:00
Eli Cohen
30aa60b3bd IB/mlx5: Support 4k UAR for libmlx5
Add fields to structs to convey to kernel an indication whether the
library supports multi UARs per page and return to the library the size
of a UAR based on the queried value.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-09 20:25:09 +02:00
Eli Cohen
b037c29a80 IB/mlx5: Allow future extension of libmlx5 input data
Current check requests that new fields in struct
mlx5_ib_alloc_ucontext_req_v2 that are not known to the driver be zero.
This was introduced so new libraries passing additional information to
the kernel through struct mlx5_ib_alloc_ucontext_req_v2 will be notified
by old kernels that do not support their request by failing the
operation. This schecme is problematic since it requires libmlx5 to issue
the requests with descending input size for struct
mlx5_ib_alloc_ucontext_req_v2.

To avoid this, we require that new features that will obey the following
rules:
If the feature requires one or more fields in the response and the at
least one of the fields can be encoded such that a zero value means the
kernel ignored the request then this field will provide the indication
to the library. If no response is required or if zero is a valid
response, a new field should be added that indicates to the library
whether its request was processed.

Fixes: b368d7cb8c ('IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-09 20:25:09 +02:00
Eli Cohen
5fe9dec0d0 IB/mlx5: Use blue flame register allocator in mlx5_ib
Make use of the blue flame registers allocator at mlx5_ib. Since blue
flame was not really supported we remove all the code that is related to
blue flame and we let all consumers to use the same blue flame register.
Once blue flame is supported we will add the code. As part of this patch
we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only
used by mlx5_ib.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-09 20:25:08 +02:00
Eli Cohen
0b80c14f00 IB/mlx5: Fix retrieval of index to first hi class bfreg
First the function retrieving the index of the first hi latency class
blue flame register. High latency class bfregs are located right above
medium latency class bfregs.

Fixes: c1be5232d2 ('IB/mlx5: Fix micro UAR allocator')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-08 11:21:26 +02:00
Eli Cohen
2f5ff26478 mlx5: Fix naming convention with respect to UARs
This establishes a solid naming conventions for UARs. A UAR (User Access
Region) can have size identical to a system page or can be fixed 4KB
depending on a value queried by firmware. Each UAR always has 4 blue
flame register which are used to post doorbell to send queue. In
addition, a UAR has section used for posting doorbells to CQs or EQs. In
this patch we change names to reflect this conventions.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-08 11:21:26 +02:00
Eli Cohen
f4044dac63 IB/mlx5: Fix error handling order in create_kernel_qp
Make sure order of cleanup is exactly the opposite of initialization.

Fixes: 9603b61de1 ('mlx5: Move pci device handling from mlx5_ib to mlx5_core')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-08 11:21:26 +02:00
Eli Cohen
de8d6e02ef IB/mlx5: Fix kernel to user leak prevention logic
The logic was broken as it failed to update the response length for
architectures with PAGE_SIZE larger than 4kB. As a result further
extension of the ucontext response struct would fail.

Fixes: d69e3bcf79 ('IB/mlx5: Mmap the HCA's core clock register to user-space')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-08 11:21:26 +02:00
David S. Miller
76eb75be79 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-05 11:03:07 -05:00
Artemy Kovalyov
aa8e08d2f5 IB/mlx5: Improve MR check
Add "type" field to mlx5_core MKEY struct.
Check whether page fault happens on MKEY corresponding to MR.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:51:20 -05:00
Artemy Kovalyov
17d2f88f92 IB/mlx5: Add ODP atomics support
Handle ODP atomic operations. When initiator of RDMA atomic
operation use ODP MR to provide source data handle pagefault properly.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:51:20 -05:00
Artemy Kovalyov
d9aaed8387 {net,IB}/mlx5: Refactor page fault handling
* Update page fault event according to last specification.
* Separate code path for page fault EQ, completion EQ and async EQ.
* Move page fault handling work queue from mlx5_ib static variable
  into mlx5_core page fault EQ.
* Allocate memory to store ODP event dynamically as the
  events arrive, since in atomic context - use mempool.
* Make mlx5_ib page fault handler run in process context.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:51:20 -05:00
Artemy Kovalyov
7d0cc6edcc IB/mlx5: Add MR cache for large UMR regions
In this change we turn mlx5_ib_update_mtt() into generic
mlx5_ib_update_xlt() to perfrom HCA translation table modifiactions
supporting both atomic and process contexts and not limited by number
of modified entries.
Using this function we increase preallocated MRs up to 16GB.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:51:20 -05:00
Artemy Kovalyov
c438fde1c2 IB/mlx5: Add support for big MRs
Make use of extended UMR translation offset.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:51:20 -05:00
Artemy Kovalyov
3161625589 IB/mlx5: Refactor UMR post send format
* Update struct mlx5_wqe_umr_ctrl_seg.
* Currenlty UMR send_flags aim only certain use cases: enabled/disable
  cached MR, modifying XLT for ODP. By making flags independent make UMR
  more flexible allowing arbitrary manipulations.
* Since different UMR formats have different entry sizes UMR request
  should receive exact size of translation table update instead of
  number of entries. Rename field npages to xlt_size in struct mlx5_umr_wr
  and update relevant code accordingly.
* Add support of length64 bit.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:51:20 -05:00
Binoy Jayan
d5ea2df9ce IB/mlx5: Add helper mlx5_ib_post_send_wait
Clean up the following common code (to post a list of work requests to the
send queue of the specified QP) at various places and add a helper function
'mlx5_ib_post_send_wait' to implement the same.

 - Initialize 'mlx5_ib_umr_context' on stack
 - Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe
 - Acquire the semaphore
 - call ib_post_send with a single ib_send_wr
 - wait_for_completion()
 - Check for umr_context.status
 - Release the semaphore

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:51:20 -05:00
Leon Romanovsky
9f885201f2 IB/mlx5: Reorder code in query device command
The order of features exposed by private mlx5-abi.h
file is CQE zipping, packet pacing and multi-packet WQE.

The internal order implemented in mlx5_ib_query_device() is
multi-packet WQE, CQE zipping and packet pacing.

Such difference hurts code readability, so let's sync,
while mlx5-abi.h (exposed to userspace) is the primary
order.

This commit doesn't change any functionality.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:51:20 -05:00
Jack Morgenstein
10b1c04e92 net/mlx4_core: Fix raw qp flow steering rules under SRIOV
Demoting simple flow steering rule priority (for DPDK) was achieved by
wrapping FW commands MLX4_QP_FLOW_STEERING_ATTACH/DETACH for the PF
as well, and forcing the priority to MLX4_DOMAIN_NIC in the wrapper
function for the PF and all VFs.

In function mlx4_ib_create_flow(), this change caused the main rule
creation for the PF to be wrapped, while it left the associated
tunnel steering rule creation unwrapped for the PF.

This mismatch caused rule deletion failures in mlx4_ib_destroy_flow()
for the PF when the detach wrapper function did not find the associated
tunnel-steering rule (since creation of that rule for the PF did not
go through the wrapper function).

Fix this by setting MLX4_QP_FLOW_STEERING_ATTACH/DETACH to be "native"
(so that the PF invocation does not go through the wrapper), and perform
the required priority demotion for the PF in the mlx4_ib_create_flow()
code path.

Fixes: 48564135cb ("net/mlx4_core: Demote simple multicast and broadcast flow steering rules")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 14:17:40 -05:00
Linus Torvalds
7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Linus Torvalds
296915912d First round of -rc fixes for 4.10 kernel
- Series of qedr fixes
 - Series of rxe fixes
 - One isolated i40iw fix
 - One isolated cma fix
 - One isolated cxgb4 fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYXAGvAAoJELgmozMOVy/dDukQAMMNarWp0U8KfNYRU5tyCBwd
 aIQC1gFT6GUCFys40Z6L84m1D3NpGR+vzVv3grVBeuge73b79zAOHXvVDwJCA+Jl
 QQLG3vZ13C3158sLDiK8zL+4Ob5OfOQ5nQ2spvDfJWpye9SD+pWFcrpqvK02ANRN
 kFHILk1gROBTNi46yBR5hjWOkw7Bua6XLsPxh6xoaDZ43NL0r0xgm43FTnj/19x3
 0zpZYYKP+3C6U7678rqaog9zfXHvadghW5/WBJ/VgfKqEmH89ESx4J2MvbB8DxFD
 1tWAOpr5TNY5jnh8mtUsceDjCzQivc/RWqAu05BspEwcavjSLFyRYr1epR0/4oAd
 PqLSmfORmhpJ8+5Kmn+chtXo3TT4SYGHIzSUbgbEV/ClwX/7UW+w8mfQZ3buUBq/
 cQp/oRnJcsrQIEDFO3AH7P+6Sxy6t3zbSl5oKBUOI1u4RFmC7YBPqo9fQu2Z2mGk
 3+AWQaPr7qgEcFzXBgLzvd4LhTYKsvmiNwrcXi9KjjwQjNEVg15qqF2YtmxEUgi9
 kh3IOcGan3iSblhV/WLrxcOjlPQrPpBOVnTPhUskFtlsrD+032OxeOBpVoU3nCUt
 MjTYWoNTYdw4wHz0w373o0uR4+4nl4a5OmO4Fh6Drmg5hm4Bl9BWy0Kziu93Z1Ay
 Z2utZVWLWhBzn8yJujUz
 =NW9g
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
 "First round of -rc fixes for 4.10 kernel:

   - a series of qedr fixes
   - a series of rxe fixes
   - one i40iw fix
   - one cma fix
   - one cxgb4 fix"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/rxe: Don't check for null ptr in send()
  IB/rxe: Drop future atomic/read packets rather than retrying
  IB/rxe: Use BTH_PSN_MASK when ACKing duplicate sends
  qedr: Always notify the verb consumer of flushed CQEs
  qedr: clear the vendor error field in the work completion
  qedr: post_send/recv according to QP state
  qedr: ignore inline flag in read verbs
  qedr: modify QP state to error when destroying it
  qedr: return correct value on modify qp
  qedr: return error if destroy CQ failed
  qedr: configure the number of CQEs on CQ creation
  i40iw: Set 128B as the only supported RQ WQE size
  IB/cma: Fix a race condition in iboe_addr_get_sgid()
  IB/rxe: Fix a memory leak in rxe_qp_cleanup()
  iw_cxgb4: set correct FetchBurstMax for QPs
2016-12-23 10:38:48 -08:00
Andrew Boyer
5cc8fabc5e IB/rxe: Don't check for null ptr in send()
pkt->qp was already dereferenced earlier in the function.

Fixes Smatch complaint:
drivers/infiniband/sw/rxe/rxe_net.c:458 send()
	 warn: variable dereferenced before check 'pkt->qp' (see line 441)

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22 11:36:12 -05:00
Andrew Boyer
cbf1f9a46c IB/rxe: Drop future atomic/read packets rather than retrying
If the completer is in the middle of a large read operation, one
lost packet can cause havoc. Going to COMPST_ERROR_RETRY will
cause the requester to resend the request. After that, any packet
from the first attempt still in the receive queue will be
interpreted as an error, restarting the error/retry sequence.
The transfer will quickly exhaust its retries.

This behavior is very noticeable when doing 512KB reads on a
QEMU system configured with 1500B MTU.

Also, a resent request here will prompt the responder on the
other side to immediately start resending, but the resent
packets will get stuck in the already-loaded receive queue and
will never be processed.

Rather than erroring out every time an unexpected future packet
arrives, just drop it. Eventually the retry timer will send a
duplicate request; the completer will be able to make progress since
the queue will start relatively empty.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-22 11:36:12 -05:00