To improve the readability of function prototypes, give the parameters
names in the affinity module.
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The current user SDMA request structure layout has holes.
The cachelines can be reduced to improve cacheline trading.
Separate fields in the following categories: mostly read,
writable and shared with interrupt.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
An RB tree is used for the SDMA pinning cache. Cache
entries are extracted and reinserted from the tree
in case the address range for it changes. However,
if the address range for the entry doesn't change,
deleting the entry from the RB tree is not necessary.
This affects performance since the tree needs to be
rebalanced for each insertion, and this happens in
the hot path. Optimize RB search by not removing
entries when it's not needed.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
SGEs that are contiguous needlessly consume driver dependent TX resources.
The lkey validation logic is enhanced to compress the SGE that ends
up in the send wqe when consecutive addresses are detected.
The lkey validation API used to return 1 (success) or 0 (fail).
The return value is now an -errno, 0 (compressed), or 1 (uncompressed). A
additional argument is added to pass the last SQE for the compression.
Loopback callers always pass a NULL to last_sge since the optimization is
of little benefit in that situation.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We move many common IB fields into the hfi1_packet structure and
set them up in a single function. This allows us to set the fields
in a single place and not deal with them throughout the driver.
Reviewed-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Calls to trace incoming packets will now receive the packet
context as parameter. This enables trace support for future
packet types.
Header trace output is in the format <field>:<value>
which makes parsing easier.
input_ibhdr trace before change:
<idle>-0 [001] d.h. 5904.250925: input_ibhdr: [0000:05:00.0] vl 0
lver 0 sl 0 lnh 2,LRH_BTH dlid 0002 len 18 slid 0001 op
0x64,UD_SEND_ONLY se 0 m 0 pad 0 tver 0 pkey 0xffff f 0 b 0 qpn 0x000001
a 0 psn 0x000001b2 deth qkey 0x80010000 sqpn 0x000001
input_ibhdr trace after change:
<idle>-0 [001] d.h. 6655.714488: input_ibhdr: [0000:05:00.0] (IB)
len:124 sc:0 dlid:0x0001 slid:0x0002 lnh:2,LRH_BTH lver:0 sl:0 age:0
becn:0 fecn:0 l4:0 rc:0 entropy:0 op:0x64,UD_SEND_ONLY se:0 m:0 pad:0
tver:0 pkey:0x7fff f:0 b:0 qpn:0x000001 a:0 psn:0x00000036 hlen:8 deth
qkey:0x80010000 sqpn:0x000001
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Improve code readablity by adding inline functions
to read specific BTH/IB fields without knowledge of
byte offsets.
Reviewed-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leftover function that is not used. Remove it.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The tx request is unnecessarily initialized in the hot
code path with memset(), however, there's no need to do
this as most fields are initialized later on. this
initialization shows to be costly in the profile.
Remove unnecessary initialization from tx request and make
sure all variables are initialized properly.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Reserved gids are taken by the mlx5_core, report smaller GID table
size to IB core.
Set mlx5_query_roce_port's return value back to int. In case of
error, return an indication. This rolls back some of the change
in commit 50f22fd8ec ("IB/mlx5: Set mlx5_query_roce_port's return value to void")
Change set_roce_addr to use gid_set function, instead of directly
sending the command.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add support for extended error reporting.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for extended error reporting.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two entries being added at the same time to the IFLA
policy table, whilst parallel bug fixes to decnet
routing dst handling overlapping with the dst gc removal
in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the qed_roce_if file to qed_rdma_if as it
represents a common interface for RoCE and iWARP.
this commit affects RDMA/qedr as well.
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the functions common to both iWARP and RoCE to have a prefix of
_rdma_ instead of _roce_.
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once we have iWARP support, the qede portion of the qedr<->qede would
serve all the RDMA protocols - so rename the file to be appropriate
to its function.
While we're at it, we're also moving a couple of inclusions to it into
.h files and adding includes to make sure it contains all type
definitions it requires.
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
iWARP would require the chains to allocate/free their PBL memory
independently, so add the infrastructure to provide it externally.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename:
wait_queue_t => wait_queue_entry_t
'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.
Start sorting this out by renaming it to 'wait_queue_entry_t'.
This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This series provide some updates and cleanups for mlx5 core and netdevice
driver.
From Eli Cohen, add a missing event string.
From Or Gerlitz, some checkpatch cleanups.
From Moni, Disalbe HW level LAG when SRIOV is enabled.
From Tariq, A code reuse cleanup in aRFS flow.
From Itay Aveksis, Typo fix.
From Gal Pressman, ethtool statistics updates and "update stats" deferred work optimizations.
From Majd Dibbiny, Fast unload support on kernel shutdown.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJZQv0+AAoJEEg/ir3gV/o+1iQH/15I5Pr9KoCWSTN9aUglRupU
8HmJhkf7Novaro6WtIybgMGkdoNTrmHgyTEngAkRq5a5Ws/LrC/1wLH+lVMDh+Fx
/2a5cfPsK483gHWBtAbasBD8SHnsyTIeVnEhuDsevHQNkz3HGuKOgx5ZHF1sdkHU
bj/QU06LNPKAlMDI/wKod13MB4+AdTFemaJRCCgXFvu/p/EfVvB+TStdOsrxj1kx
lDIwkCykJSJsg38HoLXt7Z12nWwgHGf2De04RukKeJ6C6KTdKcUu5EYbaL9BSZZT
jiIayYjRgeXzNhY4R5yLPc0FkecNIgC90YJShUN3nR3PWa+ytaHpfJQPOS4/AW8=
=Tjmk
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2017-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox mlx5 updates and cleanups 2017-06-16
mlx5-updates-2017-06-16
This series provide some updates and cleanups for mlx5 core and netdevice
driver.
From Eli Cohen, add a missing event string.
From Or Gerlitz, some checkpatch cleanups.
From Moni, Disalbe HW level LAG when SRIOV is enabled.
From Tariq, A code reuse cleanup in aRFS flow.
From Itay Aveksis, Typo fix.
From Gal Pressman, ethtool statistics updates and "update stats" deferred work optimizations.
From Majd Dibbiny, Fast unload support on kernel shutdown.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.
Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:
@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)
@@
expression E, SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)
@@
expression SKB, LEN;
identifier fn = { skb_push, __skb_push, skb_push_rcsum };
@@
- fn(SKB, LEN)[0]
+ *(u8 *)fn(SKB, LEN)
Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.
Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:
@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_put, __skb_put };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)
@@
expression E, SKB, LEN;
identifier fn = { skb_put, __skb_put };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)
which actually doesn't cover pskb_put since there are only three
users overall.
A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There were many places that my previous spatch didn't find,
as pointed out by yuan linyu in various patches.
The following spatch found many more and also removes the
now unnecessary casts:
@@
identifier p, p2;
expression len;
expression skb;
type t, t2;
@@
(
-p = skb_put(skb, len);
+p = skb_put_zero(skb, len);
|
-p = (t)skb_put(skb, len);
+p = skb_put_zero(skb, len);
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, len);
|
-memset(p, 0, len);
)
@@
type t, t2;
identifier p, p2;
expression skb;
@@
t *p;
...
(
-p = skb_put(skb, sizeof(t));
+p = skb_put_zero(skb, sizeof(t));
|
-p = (t *)skb_put(skb, sizeof(t));
+p = skb_put_zero(skb, sizeof(t));
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, sizeof(*p));
|
-memset(p, 0, sizeof(*p));
)
@@
expression skb, len;
@@
-memset(skb_put(skb, len), 0, len);
+skb_put_zero(skb, len);
Apply it to the tree (with one manual fixup to keep the
comment in vxlan.c, which spatch removed.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- A fix for fix eea40b8f62 ("infiniband: call ipv6 route lookup via the
stub interface")
- 6 patches against bnxt_re...the first two are considerably larger than
I would like, but as they address real issues I went ahead and
submitted them (it also helped that a good deal of the churn was
removing code repeated in multiple places and consolidating it to one
common function)
- 2 fixes against qedr that just came in
- 1 fix against rxe that took a few revisions to get right plus time to
get the proper reviews
- 5 late breaking IPoIB fixes
- 1 late cxgb4 fix
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZQY2UAAoJELgmozMOVy/d/goP/1uNXbrSG7CfdOykvGdm4lbN
rmal4D0ChqivDsEll+eEaCBPBFZnQXbK/n5Fu/R4+qvFCWPqxAV0kMckKYsEMYNK
Q9zh6h3jxeBA8ms5D6MeZt31tzMG4VQjjs0LvbIz1S0wHbHuNd2dPCMwxKNOmm0H
eTJQ7SAssYnHJmWnQI0R6QZPfAcOOT2HgLsBdvq8apIk7tRRzflQ5YQWZE1X4cv0
iYshMgmkSR9vgr83YZwYQYd6Uc/yUDj87FrBR7ELUyE9Dr0tMoprtM3fekw78uLE
YBiIMb6NwUH+2q5bxPuaq0DulEsdEdMPdtNOY/VRSduqCkdlGPS0KMvLzHcm701b
Ks6gQQMvpb5lNUPasmGkowz7gwTrIniKu9nZUcJ/4BZRH4ax6hrZh3yj/Kx4GuxF
hp+KZPokytLfNEhuJKgrIQx6r2kfMJKr6eTDEAVp8dZ54bymWQgZhoQ8lLFF2isQ
MuCM2p7G/gauu5WrMZBc4vjvWaHAfV60KrCwxJQwQIBZwTRwA619nuDi3McED1R8
YS3pUxu3QJRaZWqI4vS10CUTxAz/4NVHLjE2rkDJAMMKRfBu94CTU/DklQ6jejKr
njvK5uryjTaJWN8F/EKRs85NhA/yqZpb8rDhoWn3ri7eyHdj9SsEw0RpVIorYiqT
zsk6SIT9biA5bs46kQ6P
=blNV
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"I had thought at the time of the last pull request that there wouldn't
be much more to go, but several things just kept trickling in over the
last week.
Instead of just the six patches to bnxt_re that I had anticipated,
there are another five IPoIB patches, two qedr patches, and a few
other miscellaneous patches.
The bnxt_re patches are more lines of diff than I like to submit this
late in the game. That's mostly because of the first two patches in
the series of six. I almost dropped them just because of the lines of
churn, but on a close review, a lot of the churn came from removing
duplicated code sections and consolidating them into callable
routines. I felt like this made the number of lines of change more
acceptable, and they address problems, so I left them. The remainder
of the patches are all small, well contained, and well understood.
These have passed 0day testing, but have not been submitted to
linux-next (but a local merge test with your current master was
without any conflicts).
Summary:
- A fix for fix eea40b8f62 ("infiniband: call ipv6 route lookup via
the stub interface")
- Six patches against bnxt_re...the first two are considerably larger
than I would like, but as they address real issues I went ahead and
submitted them (it also helped that a good deal of the churn was
removing code repeated in multiple places and consolidating it to
one common function)
- Two fixes against qedr that just came in
- One fix against rxe that took a few revisions to get right plus
time to get the proper reviews
- Five late breaking IPoIB fixes
- One late cxgb4 fix"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
rdma/cxgb4: Fix memory leaks during module exit
IB/ipoib: Fix memory leak in create child syscall
IB/ipoib: Fix access to un-initialized napi struct
IB/ipoib: Delete napi in device uninit default
IB/ipoib: Limit call to free rdma_netdev for capable devices
IB/ipoib: Fix memory leaks for child interfaces priv
rxe: Fix a sleep-in-atomic bug in post_one_send
RDMA/qedr: Add 64KB PAGE_SIZE support to user-space queues
RDMA/qedr: Initialize byte_len in WC of READ and SEND commands
RDMA/bnxt_re: Remove FMR support
RDMA/bnxt_re: Fix RQE posting logic
RDMA/bnxt_re: Add HW workaround for avoiding stall for UD QPs
RDMA/bnxt_re: Dereg MR in FW before freeing the fast_reg_page_list
RDMA/bnxt_re: HW workarounds for handling specific conditions
RDMA/bnxt_re: Fixing the Control path command and response handling
IB/addr: Fix setting source address in addr6_resolve()
Fixed few places where endianness was misspelled and
one spot whwere output was:
CHECK: 'endianess' may be misspelled - perhaps 'endianness'?
CHECK: 'ouput' may be misspelled - perhaps 'output'?
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Pull crypto fix from Herbert Xu:
"This fixes a bug on sparc where we may dereference freed stack memory"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: Work around deallocated stack frame reference gcc bug on sparc.
The flow of creating a new child goes through ipoib_vlan_add
which allocates a new interface and checks the rtnl_lock.
If the lock is taken, restart_syscall will be called to restart
the system call again. In this case we are not releasing the
already allocated interface, causing a leak.
Fixes: 9baa0b0364 ("IB/ipoib: Add rtnl_link_ops support")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch mekas init_default and uninit_default symmetric
with a call to delete napi. Additionally, the uninit_default
gained delete napi call in case of init_default fails.
Fixes: 515ed4f3aa ('IB/IPoIB: Separate control and data related initializations')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is a need to free priv explicitly and not just to release
the device, child priv is freed explicitly on remove flow and this
patch also includes priv free on error flow in P_key creation
and also in add_port.
Fixes: cd565b4b51 ('IB/IPoIB: Support acceleration options callbacks')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The driver may sleep under a spin lock, and the function call path is:
post_one_send (acquire the lock by spin_lock_irqsave)
init_send_wqe
copy_from_user --> may sleep
There is no flow that makes "qp->is_user" true, and copy_from_user may
cause bug when a non-user pointer is used. So the lines of copy_from_user
and check of "qp->is_user" are removed.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add 64KB PAGE_SIZE support to user-space CQ, SQ and RQ queues.
De-facto it means that code was added to translate 64KB
pages to smaller 4KB pages that the FW can handle. Otherwise,
the FW would wrap (or jump to the next page) when reaching 4KB
while the user space library will continue on the same large page.
Note that MR code remains as is since the FW supports larger pages
for MRs.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Initialize byte_len in work completion of RDMA_READ and RDMA_SEND.
Exposed by uDAPL application.
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Some issues observed with FMR implementation
while running stress traffic. So removing the
FMR verbs support for now.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch adds code to ring RQ Doorbell aggressively
so that the adapter can DMA RQ buffers sooner, instead
of DMA all WQEs in the post_recv WR list together at the
end of the post_recv verb.
Also use spinlock to serialize RQ posting
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
HW stalls out after 0x800000 WQEs are posted for UD QPs.
To workaround this problem, driver will send a modify_qp cmd
to the HW at around the halfway mark(0x400000) so that FW
can accordingly modify the QP context in the HW to prevent this
stall.
This workaround needs to be done for UD, QP1 and Raw Ethertype
packets. Added a counter to keep track of WQEs posted during post_send.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If the host buffers are freed before destroying MR in HW,
HW could try accessing these buffers. This could cause a host
crash. Fixing the code to avoid this condition.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch implements the following HW workarounds
1. The SQ depth needs to be augmented by 128 + 1 to avoid running
into an Out of order CQE issue
2. Workaround to handle the problem where the HW fast path engine continues
to access DMA memory in retranmission mode even after the WQE has
already been completed. If the HW reports this condition, driver detects
it and posts a Fence WQE. The driver stops reporting the completions
to stack until it receives completion for Fence WQE.
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fixing a concurrency issue with creq handling. Each caller
was given a globally managed crsq element, which was
accessed outside a lock. This could result in corruption,
if lot of applications are simultaneously issuing Control Path
commands. Now, each caller will provide its own response buffer
and the responses will be copied under a lock.
Also, Fixing the queue full condition check for the CMDQ.
As a part of these changes, the control path code is refactored
to remove the code replication in the response status checking.
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We are trying to get rid of DRIVER_ATTR(), and all of the nes.c driver
attributes can be trivially changed to use DRIVER_ATTR_RW(), making the
code smaller and easier to manage over time.
Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: <linux-rdma@vger.kernel.org>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
LL2 today is interrupt driven - when tx/rx completion arrives [or any
other indication], qed needs to operate on the connection and pass
the information to the protocol-driver [or internal qed consumer].
Since we have several flavors of ll2 employeed by the driver,
each handler needs to do an if-else to determine the right functionality
to use based on the connection type.
In order to make things more scalable [given that we're going to add
additional types of ll2 flavors] move the infrastrucutre into using
a callback-based approach - the callbacks would be provided as part
of the connection's initialization parameters.
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want to move some TCP sysctls to net namespaces in the future.
tcp_window_scaling, tcp_sack and tcp_timestamps being fetched
from tcp_parse_options(), we need to pass an extra parameter.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On sparc, if we have an alloca() like situation, as is the case with
SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack
memory. The result can be that the value is clobbered if a trap
or interrupt arrives at just the right instruction.
It only occurs if the function ends returning a value from that
alloca() area and that value can be placed into the return value
register using a single instruction.
For example, in lib/libcrc32c.c:crc32c() we end up with a return
sequence like:
return %i7+8
lduw [%o5+16], %o0 ! MEM[(u32 *)__shash_desc.1_10 + 16B],
%o5 holds the base of the on-stack area allocated for the shash
descriptor. But the return released the stack frame and the
register window.
So if an intererupt arrives between 'return' and 'lduw', then
the value read at %o5+16 can be corrupted.
Add a data compiler barrier to work around this problem. This is
exactly what the gcc fix will end up doing as well, and it absolutely
should not change the code generated for other cpus (unless gcc
on them has the same bug :-)
With crucial insight from Eric Sandeen.
Cc: <stable@vger.kernel.org>
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove date and bump version for mlx4_ib driver.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the adapter consumes two tids for every ipv6 offload
connection be it active or passive, calculate tid usage
count accordingly.
Also change the signatures of relevant functions to get
the address family.
Signed-off-by: Rizwan Ansari <rizwana@chelsio.com>
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit eea40b8f62 ("infiniband: call ipv6 route lookup via the stub
interface") introduced a regression in address resolution when connecting
to IPv6 destination addresses. The old code called ip6_route_output(),
while the new code calls ipv6_stub->ipv6_dst_lookup(). The two are almost
the same, except that ipv6_dst_lookup() also calls ip6_route_get_saddr()
if the source address is in6addr_any.
This means that the test of ipv6_addr_any(&fl6.saddr) now never succeeds,
and so we never copy the source address out. This ends up causing
rdma_resolve_addr() to fail, because without a resolved source address,
cma_acquire_dev() will fail to find an RDMA device to use. For me, this
causes connecting to an NVMe over Fabrics target via RoCE / IPv6 to fail.
Fix this by copying out fl6.saddr if ipv6_addr_any() is true for the original
source address passed into addr6_resolve(). We can drop our call to
ipv6_dev_get_saddr() because ipv6_dst_lookup() already does that work.
Fixes: eea40b8f62 ("infiniband: call ipv6 route lookup via the stub interface")
Cc: <stable@vger.kernel.org> # 3.12+
Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Commit 9fdca4da4d (IB/SA: Split struct sa_path_rec based on IB and
ROCE specific fields) moved the service_id to be specific attribute
for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.
This caused to the following kernel panic in the CMA request handler flow:
[ 27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 27.074731] IP: __radix_tree_lookup+0x1d/0xe0
...
[ 27.075356] Workqueue: ib_cm cm_work_handler [ib_cm]
[ 27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000
[ 27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0
...
[ 27.075979] Call Trace:
[ 27.076015] radix_tree_lookup+0xd/0x10
[ 27.076055] cma_ps_find+0x59/0x70 [rdma_cm]
[ 27.076097] cma_id_from_event+0xd2/0x470 [rdma_cm]
[ 27.076144] ? ib_init_ah_from_path+0x39a/0x590 [ib_core]
[ 27.076193] cma_req_handler+0x25/0x480 [rdma_cm]
[ 27.076237] cm_process_work+0x25/0x120 [ib_cm]
[ 27.076280] ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm]
[ 27.076350] cm_req_handler+0xb03/0xd40 [ib_cm]
[ 27.076430] ? sched_clock_cpu+0x11/0xb0
[ 27.076478] cm_work_handler+0x194/0x1588 [ib_cm]
[ 27.076525] process_one_work+0x160/0x410
[ 27.076565] worker_thread+0x137/0x4a0
[ 27.076614] kthread+0x112/0x150
[ 27.076684] ? max_active_store+0x60/0x60
[ 27.077642] ? kthread_park+0x90/0x90
[ 27.078530] ret_from_fork+0x2c/0x40
This patch moves it back to the common SA Path Record structure
and removes the redundant setter and getter.
Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.
Fixes: 9fdca4da4d (IB/SA: Split struct sa_path_rec based on IB ands
ROCE specific fields)
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This change will optimize kernel memory deregistration operations.
__ib_umem_release() used to call set_page_dirty_lock() against every
writable page in its memory region. Its purpose is to keep data
synced between CPU and DMA device when swapping happens after mem
deregistration ops. Now we choose not to set page dirty bit if it's
already set by kernel prior to calling __ib_umem_release(). This
reduces memory deregistration time by half or even more when we ran
application simulation test program.
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Commit 5752075144 ("IB/SA: Add OPA path record type") introduced
new local function __ib_copy_path_rec_to_user, but didn't limit its
scope. This produces the following sparse warning:
drivers/infiniband/core/uverbs_marshall.c:99:6: warning:
symbol '__ib_copy_path_rec_to_user' was not declared. Should it be
static?
In addition, it used sizeof ... notations instead of sizeof(...), which
is correct in C, but a little bit misleading. Let's change it too.
Fixes: 5752075144 ("IB/SA: Add OPA path record type")
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
RDMA netlink is part of ib_core, hence ibnl_chk_listeners(),
ibnl_init() and ibnl_cleanup() don't need to be published
in public header file.
Let's remove EXPORT_SYMBOL from ibnl_chk_listeners() and move all these
functions to private header file.
CC: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If srp_init_qp() fails at srp_create_ch_ib() then ch->send_cq
may be NULL.
Calling directly to ib_destroy_qp() is sufficient because
no work requests were posted on the created qp.
Fixes: 9294000d6d ("IB/srp: Drain the send queue before destroying a QP")
Cc: <stable@vger.kernel.org>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>
ipoib_dev_uninit_default() call is used in ipoib_main.c file only
and it generates the following warning from smatch tool:
drivers/infiniband/ulp/ipoib/ipoib_main.c:1593:6: warning:
symbol 'ipoib_dev_uninit_default' was not declared. Should it
be static?
so let's declare that function as static.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Cache the needed umr_fence and set the wqe ctrl segmennt
accordingly.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The cited patch added a type field to structures ib_ah and rdma_ah_attr.
Function mlx4_ib_query_ah() builds an rdma_ah_attr structure from the
data in an mlx4_ib_ah structure (which contains both an ib_ah structure
and an address vector).
For mlx4_ib_query_ah() to work properly, the type field in the contained
ib_ah structure must be set correctly.
In the outgoing MAD tunneling flow, procedure mlx4_ib_multiplex_mad()
paravirtualizes a MAD received from a slave and sends the processed
mad out over the wire. During this processing, it populates an
mlx4_ib_ah structure and calls mlx4_ib_query_ah().
The cited commit overlooked setting the type field in the contained
ib_ah structure before invoking mlx4_ib_query_ah(). As a result, the
type field remained uninitialized, and the rdma_ah_attr structure was
incorrectly built. This resulted in improperly built MADs being sent out
over the wire.
This patch properly initializes the type field in the contained ib_ah
structure before calling mlx4_ib_query_ah(). The rdma_ah_attr structure
is then generated correctly.
Fixes: 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The handling of IB_RDMA_WRITE_ONLY_WITH_IMMEDIATE will leak a memory
reference when a buffer cannot be allocated for returning the immediate
data.
The issue is that the rkey validation has already occurred and the RNR
nak fails to release the reference that was fruitlessly gotten. The
the peer will send the identical single packet request when its RNR
timer pops.
The fix is to release the held reference prior to the rnr nak exit.
This is the only sequence the requires both rkey validation and the
buffer allocation on the same packet.
Cc: Stable <stable@vger.kernel.org> # 4.7+
Tested-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Keep VL15 credits at 0 during LNI, before link-up. Store
VL15 credits value during verify cap interrupt and set
in after link-up. This addresses an issue where VL15 MAD
packets could be sent by one side of the link before
the other side is ready to receive them.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The Omni-Path adapter driver fails to load on the ppc64le platform
due to invalid PCI setup.
This patch makes the PCI configuration more robust and will
fix 64 bit addressing for ppc64le.
Signed-off-by: Steven L Roberts <robers97@gmail.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This fixes a kernel panic when loading the hfi driver as a dynamic module.
Signed-off-by: Steven L Roberts <robers97@gmail.com>
Reviewed-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Take care of ipv6 checks while computing header length for deducing mtu
size of ipv6 servers. Due to the incorrect header length computation for
ipv6 servers, wrong mss is reported to the peer (client).
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The patch 761e19a504 (RDMA/iw_cxgb4: Handle return value of
c4iw_ofld_send() in abort_arp_failure()) from May 6, 2016
leads to the following static checker warning:
drivers/infiniband/hw/cxgb4/cm.c:575 abort_arp_failure()
warn: passing freed memory 'skb'
Also fixes skb leak when l2t resolution fails
Fixes: 761e19a504 (RDMA/iw_cxgb4: Handle return value of
c4iw_ofld_send() in abort_arp_failure())
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Explicitly ACK the MPA Reply frame so the peer
does not retransmit the frame.
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Don't set control flag for 0-length FULPDU (Send) RTR indication
in the enhanced MPA Request/Reply frames, because it isn't supported.
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Some error paths in i40iw_initialize_dev are doing
additional and unnecessary work before exiting.
Correctly free resources allocated prior to error
and return with correct status code.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intelcom>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Explicitly ACK the MPA Reject frame so the peer does
not retransmit the frame.
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Don't set control flag for 0-length FULPDU (Send)
RTR indication in the enhanced MPA Request/Reply
frames, because it isn't supported.
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Allocate and free a security context when creating and destroying a MAD
agent. This context is used for controlling access to PKeys and sending
and receiving SMPs.
When sending or receiving a MAD check that the agent has permission to
access the PKey for the Subnet Prefix of the port.
During MAD and snoop agent registration for SMI QPs check that the
calling process has permission to access the manage the subnet and
register a callback with the LSM to be notified of policy changes. When
notificaiton of a policy change occurs recheck permission and set a flag
indicating sending and receiving SMPs is allowed.
When sending and receiving MADs check that the agent has access to the
SMI if it's on an SMI QP. Because security policy can change it's
possible permission was allowed when creating the agent, but no longer
is.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: remove the LSM hook init code]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a generic notificaiton mechanism in the LSM. Interested consumers
can register a callback with the LSM and security modules can produce
events.
Because access to Infiniband QPs are enforced in the setup phase of a
connection security should be enforced again if the policy changes.
Register infiniband devices for policy change notification and check all
QPs on that device when the notification is received.
Add a call to the notification mechanism from SELinux when the AVC
cache changes or setenforce is cleared.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.
Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.
When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.
Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.
In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.
These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.
1. When a QP is modified to a particular Port, PKey index or alternate
path insert that QP into the appropriate lists.
2. Check permission to access the new settings.
3. If step 2 grants access attempt to modify the QP.
4a. If steps 2 and 3 succeed remove any prior associations.
4b. If ether fails remove the new setting associations.
If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.
Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.
If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.
To keep the security changes isolated a new file is used to hold security
related functionality.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Cache the subnet prefix and add a function to access it. Enforcing
security requires frequent queries of the subnet prefix and the pkeys in
the pkey table.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit a7c3e901a4 ("mm: introduce kv[mz]alloc helpers") added
proper implementation of mlx5_vzalloc function to the MM core.
This made the mlx5_vzalloc function useless, so let's remove it.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Pull SCSI target updates from Nicholas Bellinger:
"Things were a lot more calm than previously expected. It's primarily
fixes in various areas, with most of the new functionality centering
around TCMU backend driver work that Xiubo Li has been driving.
Here's the summary on the feature side:
- Make T10-PI verify configurable for emulated (FILEIO + RD) backends
(Dmitry Monakhov)
- Allow target-core/TCMU pass-through to use in-kernel SPC-PR logic
(Bryant Ly + MNC)
- Add TCMU support for growing ring buffer size (Xiubo Li + MNC)
- Add TCMU support for global block data pool (Xiubo Li + MNC)
and on the bug-fix side:
- Fix COMPARE_AND_WRITE non GOOD status handling for READ phase
failures (Gary Guo + nab)
- Fix iscsi-target hang with explicitly changing per NodeACL
CmdSN number depth with concurrent login driven session
reinstatement. (Gary Guo + nab)
- Fix ibmvscsis fabric driver ABORT task handling (Bryant Ly)
- Fix target-core/FILEIO zero length handling (Bart Van Assche)
Also, there was an OOPs introduced with the WRITE_VERIFY changes that
I ended up reverting at the last minute, because as not unusual Bart
and I could not agree on the fix in time for -rc1. Since it's specific
to a conformance test, it's been reverted for now.
There is a separate patch in the queue to address the underlying
control CDB write overflow regression in >= v4.3 separate from the
WRITE_VERIFY revert here, that will be pushed post -rc1"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (30 commits)
Revert "target: Fix VERIFY and WRITE VERIFY command parsing"
IB/srpt: Avoid that aborting a command triggers a kernel warning
IB/srpt: Fix abort handling
target/fileio: Fix zero-length READ and WRITE handling
ibmvscsis: Do not send aborted task response
tcmu: fix module removal due to stuck thread
target: Don't force session reset if queue_depth does not change
iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
target: Fix compare_and_write_callback handling for non GOOD status
tcmu: Recalculate the tcmu_cmd size to save cmd area memories
tcmu: Add global data block pool support
tcmu: Add dynamic growing data area feature support
target: fixup error message in target_tg_pt_gp_tg_pt_gp_id_store()
target: fixup error message in target_tg_pt_gp_alua_access_type_store()
target/user: PGR Support
target: Add WRITE_VERIFY_16
Documentation/target: add an example script to configure an iSCSI target
target: Use kmalloc_array() in transport_kmap_data_sg()
target: Use kmalloc_array() in compare_and_write_callback()
target: Improve size determinations in two functions
...
This includes:
* Some code optimizations for the Intel VT-d driver
* Code to switch off a previously enabled Intel IOMMU
* Support for 'struct iommu_device' for OMAP, Rockchip and
Mediatek IOMMUs
* Some header optimizations for IOMMU core code headers and a
few fixes that became necessary in other parts of the kernel
because of that
* ACPI/IORT updates and fixes
* Some Exynos IOMMU optimizations
* Code updates for the IOMMU dma-api code to bring it closer to
use per-cpu iova caches
* New command-line option to set default domain type allocated
by the iommu core code
* Another command line option to allow the Intel IOMMU switched
off in a tboot environment
* ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using
an IDENTITY domain in conjunction with DMA ops, Support for
SMR masking, Support for 16-bit ASIDs (was previously broken)
* Various other small fixes and improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZEY4XAAoJECvwRC2XARrjth0QAKV56zjnFclv39aDo6eCq9CT
51+XT4bPY5VKQ2+Jx76TBNObHmGK+8KEMHfT9khpWJtFCDyy25SGckLry1nYqmZs
tSTsbj4sOeCyKzOLITlRN9/OzKXkjKAxYuq+sQZZFDFYf3kCM/eag0dGAU6aVLNp
tkIal3CSpGjCQ9M5JohrtQ1mwiGqCIkMIgvnBjRw+bfpLnQNG+VL6VU2G3RAkV2b
5Vbdoy+P7ZQnJSZr/bibYL2BaQs2diR4gOppT5YbsfniMq4QYSjheu1xBboGX8b7
sx8yuPi4370irSan0BDvlvdQdjBKIRiDjfGEKDhRwPhtvN6JREGakhEOC8MySQ37
mP96B72Lmd+a7DEl5udOL7tQILA0DcUCX0aOyF714khnZuFU5tVlCotb/36xeJ+T
FPc3RbEVQ90m8dYU6MNJ+ahtb/ZapxGTRfisIigB6wlnZa0Evabp9EJSce6oJMkm
whbBhDubeEU18n9XAaofMbu+P2LAzq8cxiRMlsDvT4mIy7jO86jjCmhpu1Tfn2GY
4wrEQZdWOMvhUsIhObXA0aC3BzC506uvnKPW3qy041RaxBuelWiBi29qzYbhxzkr
DLDpWbUZNYPyFJjttpavyQb2/XRduBTJdVP1pQpkJNDsW5jLiBkpSqm9xNADapRY
vLSYRX0JCIquaD+PAuxn
=3aE8
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
- code optimizations for the Intel VT-d driver
- ability to switch off a previously enabled Intel IOMMU
- support for 'struct iommu_device' for OMAP, Rockchip and Mediatek
IOMMUs
- header optimizations for IOMMU core code headers and a few fixes that
became necessary in other parts of the kernel because of that
- ACPI/IORT updates and fixes
- Exynos IOMMU optimizations
- updates for the IOMMU dma-api code to bring it closer to use per-cpu
iova caches
- new command-line option to set default domain type allocated by the
iommu core code
- another command line option to allow the Intel IOMMU switched off in
a tboot environment
- ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using an
IDENTITY domain in conjunction with DMA ops, Support for SMR masking,
Support for 16-bit ASIDs (was previously broken)
- various other small fixes and improvements
* tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (63 commits)
soc/qbman: Move dma-mapping.h include to qman_priv.h
soc/qbman: Fix implicit header dependency now causing build fails
iommu: Remove trace-events include from iommu.h
iommu: Remove pci.h include from trace/events/iommu.h
arm: dma-mapping: Don't override dma_ops in arch_setup_dma_ops()
ACPI/IORT: Fix CONFIG_IOMMU_API dependency
iommu/vt-d: Don't print the failure message when booting non-kdump kernel
iommu: Move report_iommu_fault() to iommu.c
iommu: Include device.h in iommu.h
x86, iommu/vt-d: Add an option to disable Intel IOMMU force on
iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed
iommu/arm-smmu: Correct sid to mask
iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
iommu: Make iommu_bus_notifier return NOTIFY_DONE rather than error code
omap3isp: Remove iommu_group related code
iommu/omap: Add iommu-group support
iommu/omap: Make use of 'struct iommu_device'
iommu/omap: Store iommu_dev pointer in arch_data
iommu/omap: Move data structures to omap-iommu.h
iommu/omap: Drop legacy-style device support
...
Pull misc vfs updates from Al Viro:
"Assorted bits and pieces from various people. No common topic in this
pile, sorry"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/affs: add rename exchange
fs/affs: add rename2 to prepare multiple methods
Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()
fs: don't set *REFERENCED on single use objects
fs: compat: Remove warning from COMPATIBLE_IOCTL
remove pointless extern of atime_need_update_rcu()
fs: completely ignore unknown open flags
fs: add a VALID_OPEN_FLAGS
fs: remove _submit_bh()
fs: constify tree_descr arrays passed to simple_fill_super()
fs: drop duplicate header percpu-rwsem.h
fs/affs: bugfix: Write files greater than page size on OFS
fs/affs: bugfix: enable writes on OFS disks
fs/affs: remove node generation check
fs/affs: import amigaffs.h
fs/affs: bugfix: make symbolic links work again
- mlx5/IPoIB fixup patch
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZC7idAAoJELgmozMOVy/d8bEP/31hq3Vr1unNJYjU8yEaWI0l
D/k3ZSJcnLaeoUE5Ts69pYVkncAM/g55bmS80bKyNONRJp0BMU0MxmmzQC/iMqMM
LAvDf3Qyv8rTrEgjjmjWVQrP/v7K0+teD6YMmTkfB8fWqhi/vR+84Sv3c50GUXpL
OZMrn6p/IXptWa24kvi5/mKNSmeOBQkp/1lRxyi6bhAcFhjwQZoO4tDkOL6Aj1oA
FFeUcBDwMsOjbqhH59sM9ryOnGXPEwJUnLOPfq36pIrFzyUtbbcmFfMmefjJWMLK
XW4VBXywYLzkCvr2tGeXMukJzroUPQhD6G/4uwnFlwad0tovlBCwPmOmXHbnhRbh
JmriZaQEGj43o3yH/HHSeRvbwJ7e71xbH0tW21K9cB/CuYTixwlvYAMd218qXQ+k
Q6qBRueWlgsaDatlkuRV0ezrILvGQhn6QgEbCUIluDjKp/d6aRtI4IM60Y5G02U/
WLvnMQJ/MaGQXPVR1gWqWpy1ojvj/mAeTQf6aKw1Icjd3oSPgAI06B/uThXoWxh/
FyvDl/+HNW2StxPudQPKWkMjamaD/cToSa4HjLjdPER9HiCHmUru6nRZOufhwtwh
q2NEPx5kraRQNEMqjpopzNIWPpGBNBm+htkqXmTRpDdfvKgQ29vL49GCcV1VHtj9
TpYkqu/6ZVPItAAfyklh
=Ihe9
-----END PGP SIGNATURE-----
mergetag object 67cf3623e0
type commit
tag for-next
tagger Doug Ledford <dledford@redhat.com> 1493940800 -0400
Updates #3 for 4.12 kernel merge window
- The hfi1 15 patch set that landed late
- IPoIB get_link_ksettings which landed late because I asked for a
respin
- One late rxe change
- One -rc worthy fix that's in early
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZC7q5AAoJELgmozMOVy/dungQAKpWG7sq+23PhsXDTOGtKSRg
6jrgt5lG+V4twAJogXazxYsanPLQOE6ItKUFaBqEh2fULGyzXyPv1PsLNAVi4MYR
6vbsMh2qCW8+dy+uEBveirQzdwdJr8U7g1mvJYPDr1Rmaw852Sq+PbId3nEy/cFK
boMn/zhSSJuF8VsTfFCHx1gVdRN7n7gclqBP7Phq4bzdZ8E0u5GIPjmPDePUTpwH
r3yFSC3VV5JGGgOSUc66oPPoIgraAG9EQezn01Llk5n+HPJ3eM91spV1Y9tZ7EgA
toNW+bxEmanFYtuR+qzd0ctozUt2Ke6AUdTgcZcSCFkIan2ZOssJ0y2UN3CsNxlQ
rYcOBe5KBNcFVb5E3NoeG+iHo9jT86fBqzYheHKmQJy38xalAlOoX6mLOvn4pPzo
DFYDd6PpEU9SQSmhpPj1UbhvCOzecBaV3cRF/TihnqlOrRiKtd/YCccqT5GguPCB
RVW6h1qQ94rmr/k6gobT7lNDWn2bZAgUupUD19Nw+7YwauxjGhSFsXW357NVcs4B
3a/uLNVvnoyIjm62Sm8YFz9Bz3pRutfP9b6pSJ5V4P7dBgXYFZU35buViwcAj5fL
DNGx4a8Poi3VQR8lQjpK6MRCWAibUY27zW1nbWQrQ9TdcHhK4sJC9Mqe41W4FXpU
qQLkYBChz+mkCzE44bN/
=H9f5
-----END PGP SIGNATURE-----
Merge tags 'for-linus' and 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull more rdma updates from Doug Ledford:
"As mentioned in my first pull request, this is the subsequent pull
requests I had. This is all I have, and in fact this cleans out the
RDMA subsystem's entire patchworks queue of kernel changes that are
ready to go (well, it did for the weekend anyway, a few new patches
are in, but they'll be coming during the -rc cycle).
The first tag contains a single patch that would have conflicted if
taken from my tree or DaveM's tree as it needed our trees merged to
come cleanly.
The second tag contains the patch series from Intel plus three other
stragllers that came in late last week. I took them because it allowed
me to legitimately claim that the RDMA patchworks queue was, for a
short time, 100% cleared of all waiting kernel patches, woohoo! :-).
I have it under my for-next tag, so it did get 0day and linux- next
over the end of last week, and linux-next did show one minor conflict.
Summary:
'for-linus' tag:
- mlx5/IPoIB fixup patch
'for-next' tag:
- the hfi1 15 patch set that landed late
- IPoIB get_link_ksettings which landed late because I asked for a
respin
- one late rxe change
- one -rc worthy fix that's in early"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/mlx5: Enable IPoIB acceleration
* tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
rxe: expose num_possible_cpus() cnum_comp_vectors
IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type
IB/hfi1: Clean up on context initialization failure
IB/hfi1: Fix an assign/ordering issue with shared context IDs
IB/hfi1: Clean up context initialization
IB/hfi1: Correctly clear the pkey
IB/hfi1: Search shared contexts on the opened device, not all devices
IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit
IB/hfi1: Use filedata rather than filepointer
IB/hfi1: Name function prototype parameters
IB/hfi1: Fix a subcontext memory leak
IB/hfi1: Return an error on memory allocation failure
IB/hfi1: Adjust default eager_buffer_size to 8MB
IB/hfi1: Get rid of divide when setting the tx request header
IB/hfi1: Fix yield logic in send engine
IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line
IB/hfi1: Fix checks for Offline transient state
IB/ipoib: add get_link_ksettings in ethtool
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZEHmsAAoJEFmIoMA60/r88SgQAJbFddueb0+DfJ+USDud4b/Z
akfS+G1UAm+TgtMyh1wM49dHzFssp36uWJxtWI+bPqBzuy94PMCbz7JVUV28gX9G
tFhFuc5YH94I/3y85rbZnolb6uZN9MhLjzTFqDC9ilW6HFqmwK4t4wlHSCjQN1St
svLYvs2G6n6/VK3Fre7/wOvdZ1erG4Qod+kn5Tx3K5TQydmRlaSBfK+DRANuDBkM
KzGO7Bkc/Cx8hb9pHmaey/wxmNrrgmVjTtWrEnb2tEq833zP4h6GhUIJEKodMSi5
gXPNZgKlu3n5L592M0UCh4EoHejzkv9wrcsoDm+djmsc5Zg2Howq4kAdHP8k4hUG
0gt8n0ni9vhJN56jikrGi7cAdHCKSNnx2Ue/qTCbX0ncB3XUMuJxJwCsgW/6wa9f
oU7tRtTS03UltnKoFAcyYclS4TaSY4SA4ySaK6Hi+cRkdVFDdyHQYbHHNSU7MsA+
IS2tXvGoIdSYyrZMHSRcl2rRTfYQUkmPEvBF3LvqZr32M4mJMmUNAPLZaly373ZE
iwq0ZJlrLeM0cqdFIG3S60RtJyQk/HBN1NMqrYHArWOxvWIgNd5F8NCsTTxY3wU3
IxgBIuUFcbVwVkqEHGs8K5AvB3oghqdnA3eGOV79799eMtLn3LOvyIlpHMSw9WUq
ags00JtMLitfNPBH3eSl
=eE4D
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add framework for supporting PCIe devices in Endpoint mode (Kishon
Vijay Abraham I)
- use non-postable PCI config space mappings when possible (Lorenzo
Pieralisi)
- clean up and unify mmap of PCI BARs (David Woodhouse)
- export and unify Function Level Reset support (Christoph Hellwig)
- avoid FLR for Intel 82579 NICs (Sasha Neftin)
- add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)
- short-circuit config access failures for disconnected devices (Keith
Busch)
- remove D3 sleep delay when possible (Adrian Hunter)
- freeze PME scan before suspending devices (Lukas Wunner)
- stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)
- disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)
- add arch-specific alignment control to improve device passthrough by
avoiding multiple BARs in a page (Yongji Xie)
- add sysfs sriov_drivers_autoprobe to control VF driver binding
(Bodong Wang)
- allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)
- fix crashes when unbinding host controllers that don't support
removal (Brian Norris)
- add driver for MicroSemi Switchtec management interface (Logan
Gunthorpe)
- add driver for Faraday Technology FTPCI100 host bridge (Linus
Walleij)
- add i.MX7D support (Andrey Smirnov)
- use generic MSI support for Aardvark (Thomas Petazzoni)
- make Rockchip driver modular (Brian Norris)
- advertise 128-byte Read Completion Boundary support for Rockchip
(Shawn Lin)
- advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)
- convert atomic_t to refcount_t in HV driver (Elena Reshetova)
- add CPU IRQ affinity in HV driver (K. Y. Srinivasan)
- fix PCI bus removal in HV driver (Long Li)
- add support for ThunderX2 DMA alias topology (Jayachandran C)
- add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)
- add ITE 8893 bridge DMA alias quirk (Jarod Wilson)
- restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
(Manish Jaggi)
* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
PCI: Don't allow unbinding host controllers that aren't prepared
ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
MAINTAINERS: Add PCI Endpoint maintainer
Documentation: PCI: Add userguide for PCI endpoint test function
tools: PCI: Add sample test script to invoke pcitest
tools: PCI: Add a userspace tool to test PCI endpoint
Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
misc: Add host side PCI driver for PCI test function device
PCI: Add device IDs for DRA74x and DRA72x
dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
PCI: dwc: dra7xx: Workaround for errata id i870
dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
PCI: dwc: dra7xx: Add EP mode support
PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
dt-bindings: PCI: Add DT bindings for PCI designware EP mode
PCI: dwc: designware: Add EP mode support
Documentation: PCI: Add binding documentation for pci-test endpoint function
ixgbe: Use pcie_flr() instead of duplicating it
IB/hfi1: Use pcie_flr() instead of duplicating it
PCI: imx6: Fix spelling mistake: "contol" -> "control"
...
This typo is quite common. Fix it and add it to the spelling file so
that checkpatch catches it earlier.
Link: http://lkml.kernel.org/r/20170317011131.6881-2-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix typos and add the following to the scripts/spelling.txt:
momery||memory
Link: http://lkml.kernel.org/r/20170317011131.6881-1-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Let the target core check the CMD_T_ABORTED flag instead of the SRP
target driver. Hence remove the transport_check_aborted_status()
call. Since state == SRPT_STATE_CMD_RSP_SENT is something that really
should not happen, do not try to recover if srpt_queue_response() is
called for an I/O context that is in that state. This patch is a bug
fix because the srpt_abort_cmd() call is misplaced - if that function
is called from srpt_queue_response() it should either be called
before the command state is changed or after the response has been
sent.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Here is the big set of new char/misc driver drivers and features for
4.12-rc1.
There's lots of new drivers added this time around, new firmware drivers
from Google, more auxdisplay drivers, extcon drivers, fpga drivers, and
a bunch of other driver updates. Nothing major, except if you happen to
have the hardware for these drivers, and then you will be happy :)
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWQvAgg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yknsACgzkAeyz16Z97J3UTaeejbR7nKUCAAoKY4WEHY
8O9f9pr9gj8GMBwxeZQa
=OIfB
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big set of new char/misc driver drivers and features for
4.12-rc1.
There's lots of new drivers added this time around, new firmware
drivers from Google, more auxdisplay drivers, extcon drivers, fpga
drivers, and a bunch of other driver updates. Nothing major, except if
you happen to have the hardware for these drivers, and then you will
be happy :)
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits)
firmware: google memconsole: Fix return value check in platform_memconsole_init()
firmware: Google VPD: Fix return value check in vpd_platform_init()
goldfish_pipe: fix build warning about using too much stack.
goldfish_pipe: An implementation of more parallel pipe
fpga fr br: update supported version numbers
fpga: region: release FPGA region reference in error path
fpga altera-hps2fpga: disable/unprepare clock on error in alt_fpga_bridge_probe()
mei: drop the TODO from samples
firmware: Google VPD sysfs driver
firmware: Google VPD: import lib_vpd source files
misc: lkdtm: Add volatile to intentional NULL pointer reference
eeprom: idt_89hpesx: Add OF device ID table
misc: ds1682: Add OF device ID table
misc: tsl2550: Add OF device ID table
w1: Remove unneeded use of assert() and remove w1_log.h
w1: Use kernel common min() implementation
uio_mf624: Align memory regions to page size and set correct offsets
uio_mf624: Refactor memory info initialization
uio: Allow handling of non page-aligned memory regions
hangcheck-timer: Fix typo in comment
...
Callers of rxe_mem_copy() provide pointer to store updated CRC
value. That pointer was supposed to be updated, but the
commit cee2688e3c ("IB/rxe: Offload CRC calculation when possible")
mistakenly removed that assignment for RXE_MEM_TYPE_DMA memory type.
The code worked because there are no actual callers with
RXE_MEM_TYPE_DMA, who are interested in returned value of crcp.
The one caller in read_reply(), who uses the returned crcp didn't
set RXE_MEM_TYPE_DMA as mem->type.
Fixes: cee2688e3c ("IB/rxe: Offload CRC calculation when possible")
Reported-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The error path for context initialization is not consistent. Cleanup all
resources on failure.
Removed unused variable user_event_mask.
Add the _BASE_FAILED bit to the event flags so that a base context can
notify waiting sub contexts that they cannot continue.
Running out of sub contexts is an EBUSY result, not EINVAL.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The current algorithm for generating sub-context IDs is FILO. If the
contexts are not closed in that order, the uniqueness of the ID will be
compromised. I.e. logging the creation/deletion of context IDs with an
application that assigns and closes in a FIFO order reveals:
cache_id: assign: uctxt: 3 sub_ctxt: 0
cache_id: assign: uctxt: 3 sub_ctxt: 1
cache_id: assign: uctxt: 3 sub_ctxt: 2
cache_id: close: uctxt: 3 sub_ctxt: 0
cache_id: assign: uctxt: 3 sub_ctxt: 2 <<<
The sub_ctxt ID 2 is reused incorrectly.
Update the sub-context ID assign algorithm to use a bitmask of in_use
contexts. The new algorithm will allow the contexts to be closed in any
order, and will only re-use unused contexts.
Size subctxt and subctxt_cnt to match the user API size.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Context initialization mixes base context init with sub context init.
This is bad because contexts can be reused, and on reuse, reinit things
that should not re-initialized.
Normalize comments and function names to refer to base context and
sub context (not main, shared or slaves).
Separate the base context initialization from sub context initialization.
hfi1_init_ctxt() cannot return an error so changed to a void and remove
error message.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In the close path the context is removed from the device array, and then
the clear pkey function is called. The pkey function trys to get the
context from the device array, but because it was removed the clearing
does not occur.
Rework pkey clear function to work as expected. Update the function
variable to reflect the correct size and name of the hw_context.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The search for available shared contexts walks each registered hfi1
device. This search is too broad because other devices may not
be on the same fabric, and using its contexts could cause unexpected
behavior.
Removed walking the list of devices, limiting the search to the opened
device. With the device walk removed, the hfi1_devdata (dd) is not
available. Added it to the hfi1_filedata for reference.
With this change, hfi1_count_units() was rendered obsolete and was
removed.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The AHG index is only accessed in the request call
from user space, so there's no need for atomic semantics.
Replace atomic operations for SDMA_REQ_HAVE_AHG bit
with a test of the AHG index.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since almost all functions that use the hfi1_filedata get the pointer
from the file pointer, simplify by only passing the hfi1_filedata pointer.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
To improve the readability of function prototypes, give the parameters
names.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The only context that frees user_exp_rcv data structures is the last
context closed (from a sub-context set). This leaks the allocations
from the other sub-contexts. Separate the common frees from the
specific frees and call them at the appropriate time.
Using KEDR to check for memory leaks we get:
Before test:
[leak_check] Possible leaks: 25
After test:
[leak_check] Possible leaks: 31 (6 leaked data structures)
After patch applied (before and after test have the same value)
[leak_check] Possible leaks: 25
Each leak is 192 + 13440 + 6720 = 20352 bytes per sub-context.
Cc: stable@vger.kernel.org
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If the eager buffer allocation fails, it is necessary to return
an error code.
Cc: stable@vger.kernel.org
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Performance analysis shows benefits for PSM2 in increasing eager buffer
size from 2MB to 8MB. The change has neutral impact on verbs.
Make change to the module parameter's default value. Allocation
ring down was verified to work with the larger buffer size.
Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Div instructions show costly in profiles when
the tx request header is set. Using right shift
instead of a divide operation reduces the cycles
spent in the function that sets the tx request
header as shown in the profile. Use right shift
operation instead.
Profile before change:
43.24% 009
|
|--23.41%-- user_sdma_send_pkts
| |
| |--99.90%-- hfi1_user_sdma_process_requestAfter:
Profile after change:
45.75% 009
|
|--14.81%-- user_sdma_send_pkts
| |
| |--99.95%-- hfi1_user_sdma_process_request
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When there are many RC QPs and an RDMA READ request
is sent, timeouts occur on the requester side because
of fairness among RC QPs on their relative SDMA engine
on the responder side. This also hits write and send, but
to a lesser extent.
Complicating the issue is that the current code checks if workqueue
is congested before scheduling other QPs, however, this
check is based on the number of active entries in the
workqueue, which was found to be too big to for
workqueue_congested() to be effective.
Fix by reducing the number of active entries as revealed by
experimentation from the default of num_sdma to
HFI1_MAX_ACTIVE_WORKQUEUE_ENTRIES. Retry counts were monitored
to determine the correct value.
Tracing to investigate any future issues is also added.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This field is causing excessive cache line bouncing.
There are spare bytes in the r_lock cache line so the best approach
is to make an rvt QP field and remove from the hfi1 priv field.
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In goto_offline() function pstate is masked by 0xff when compared
to PLS_OFFLINE state. Mask should be 0xf0, since upper 4 bits
specify the "major" state.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to let the bonding driver report the correct speed
of the underlaying interfaces, when they are IPoIB, the ethtool
function get_link_ksettings() in the IPoIB driver is implemented.
Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Suggested-by: Håkon Bugge <Haakon.Bugge@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enable mlx5 IPoIB acceleration by declaring
mlx5_ib_{alloc,free}_rdma_netdev and assigning the mlx5
IPoIB rdma_netdev callbacks.
In addition, this patch brings in sync mlx5's IPoIB parts for net and IB
trees. As a precaution, we disabled IPoIB acceleration by default (in
the mlx5_core Kconfig file).
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
With commit eea40b8f62 ("infiniband: call ipv6 route lookup
via the stub interface"), if the route lookup fails due to
ipv6 being disabled, the dst variable is left untouched, and
the following dst_release() may access uninitialized memory.
Since ipv6_dst_lookup() always sets dst to NULL in case of
lookup failure with ipv6 enabled, fix the above just
returning the error code if the lookup fails.
Fixes: eea40b8f62 ("infiniband: call ipv6 route lookup via the stub interface")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add port_xmit_wait to the error counters read by mlx5_ib_process_mad to
ensure sysfs port counter provides correct value for PortXmitWait.
Otherwise the sysfs port_xmit_wait file always contains zero.
The previous MAD_IFC implementation populated this counter, but it was
removed during the migration to PPCNT for error counters (32-bit only).
Signed-off-by: Tim Wright <tim@binbash.co.uk>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In write to debugfs file 'resource_stats' the local buffer 'tmp_str' is
written at index 'count-1' where 'count' is the size of the write, so
potentially 0.
This patch filters odd values for the write size/position to avoid this
type of problem.
Signed-off-by: Michael Mera <dev@michaelmera.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The last two actual parameters when calling id_map_find_by_sl_id()
from id_map_get() are swapped. However, the same formal parameters to
id_map_get() have them swapped as well, inverting the effect of the
first error.
This commit improves readability, but makes no functional change to
the code.
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Change and simplify the code to match the variable name. This commit
improves readability but makes no functional change to the code.
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Suggested-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In case we got an initial sg_offset, we need to
account for it in the mr length.
Cc: stable@vger.kernel.org
Fixes: ff2ba99365 ("IB/core: Add passing an offset into the SG to
ib_map_mr_sg")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When reading a RDMA WRITE FIRST packet we copy the DMA length from the RDMA
header into the qp->resp.resid variable for later use. Later in check_rkey()
we clamp it to the MTU if the packet is an RDMA WRITE packet and has a
residual length bigger than the MTU. Later in write_data_in() we subtract the
payload of the packet from the residual length. If the packet happens to have a
payload of exactly the MTU size we end up with a residual length of 0 despite
the packet not being the last in the conversation. When the next packet in the
conversation arrives, we don't have any residual length left and thus set the QP
into an error state.
This broke NVMe over Fabrics functionality over rdma_rxe.ko
The patch was verified using the following test.
# echo eth0 > /sys/module/rdma_rxe/parameters/add
# nvme connect -t rdma -a 192.168.155.101 -s 1023 -n nvmf-test
# mkfs.xfs -fK /dev/nvme0n1
meta-data=/dev/nvme0n1 isize=256 agcount=4, agsize=65536 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0, sparse=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
# mount /dev/nvme0n1 /tmp/
[ 148.923263] XFS (nvme0n1): Mounting V4 Filesystem
[ 148.961196] XFS (nvme0n1): Ending clean mount
# dd if=/dev/urandom of=test.bin bs=1M count=128
128+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.437991 s, 306 MB/s
# sha256sum test.bin
cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a test.bin
# cp test.bin /tmp/
sha256sum /tmp/test.bin
cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a /tmp/test.bin
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When the bit 26 of capmask2 field in OPA classport info
query is set, SA will query for OPA path records instead
of querying for IB path records. Note that OPA
path records can only be queried by kernel ULPs.
Userspace clients continue to query IB path records.
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add opa_sa_path_rec to sa_path_rec data structure.
The 'type' field in sa_path_rec identifies the
type of the path record.
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
sa_path_rec now contains a union of sa_path_rec_ib and sa_path_rec_roce
based on the type of the path record. Note that fields applicable to
path record type ROCE v1 and ROCE v2 fall under sa_path_rec_roce.
Accessor functions are added to these fields so the caller doesn't have
to know the type.
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
struct sa_path_rec has a gid_type field. This patch introduces a more
generic path record specific type 'rec_type' which is either IB, ROCE v1
or ROCE v2. The patch also provides conversion functions to get
a gid type from a path record type and vice versa
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Rename ib_sa_path_rec to a more generic sa_path_rec.
This is part of extending ib_sa to also support OPA
path records in addition to the IB defined path records.
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch adds braces around parameters to sizeof
as called out by checkpatch
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Functions pvrdma_ah_attr_to_ib and ib_ah_attr_to_pvrdma have
been renamed so they are in sync wit the rename of the
ib_ah_attr structure
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
local function to_ib_ah_attr is renamed so it in
sync with the rename of the ib_ah_attr structure
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
local function to_ib_ah_attr is renamed so it in
sync with the rename of the ib_ah_attr structure
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
local function to_ib_ah_attr is renamed so it in
sync with the rename of the ib_ah_attr structure
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Rename ib_destroy_ah to rdma_destroy_ah so its in sync with the
rename of the ib address handle attribute
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Rename ib_query_ah to rdma_query_ah so its in sync with the
rename of the ib address handle attribute
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Rename ib_modify_ah to rdma_modify_ah so its in sync with the
rename of the ib address handle attribute
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Rename ib_create_ah to rdma_create_ah so its in sync with the
rename of the ib address handle attribute
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch simply renames struct ib_ah_attr to
rdma_ah_attr as these fields specify attributes that are
not necessarily specific to IB.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Zero out ib_ah_attr before calling query_ah. Set ah_flags
appropriately.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Read/write grh fields of the ah_attr only if the
ah_flags field has the IB_AH_GRH bit enabled
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch adds braces around parameters to sizeof
as called out by checkpatch
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch fixes a checkpatch issue related to not having
to use an 'else' if the 'if' path returns from the function.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Address a checkpatch issue on missing identifier names
on function definitions.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
For OPA devices, SA will query the OPA classport info
instead of the IB defined classport info.
opa classport info exposes additional information and
capabilities that are specific to OPA devices.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The include file does not need any PCI specifics, so remove
that include. Also fix the places that relied on it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Both opa_vnic and the hfi driver use the same opa_classport_info
definition. We will also have ib_sa capable of querying opa class
port info and would need this definition. Move it to ib_mad.h
for everyone to use.
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
SA will query and cache class port info as part of
its initialization. SA will also invalidate and
refresh the cache based on specific events. Callers such
as IPoIB and CM can query the SA to get the classportinfo
information. Apart from making the caller code much simpler,
this change puts the onus on the SA to query and maintain
classportinfo much like how it maitains the address handle to the SM.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moving these will facilitate changes to these in the
next patchs. This is strictly a move and there are no
changes to the functions in any way.
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This fixes a checkpatch issue. The fix is needed
so that some of these functions can be moved around
in the forthcoming patches
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This fixes a checkpatch issue. The fix is needed
so that some of these functions can be moved around
in the forthcoming patches
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This fixes a checkpatch issue. The fix is needed
so that some of these functions can be moved around
in the forthcoming patches
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The process_ecn intends to return a bool value. However it is doing
so incorrectly by ANDing the fecn mask. The fecn bit is bit 31. Bool is
not a native data type and is up to the compiler to implement how it
sees fit. It is conceivable that this upper bit gets washed out.
Fix by converting to a bool properly.
Cc: stable@vger.kernel.org
Fixes: Commit fd2b562edca6 ("IB/hfi1: Pull FECN/BECN processing to a common place")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The device/port status is not intended to be changed from user space.
Prevent a user from mapping them as write or execute.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add missing braces around else blocks in a few places to make checkpatch
happy.
Fixes: 7724105686 ("IB/hfi1: add driver files")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
According to checkpatch %Lx is not standard C so remove it and use the
suggested %llx.
Fixes: 7724105686 ("IB/hfi1: add driver files")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Checkpatch flagged a misspelled word. Fix it.
Fixes: 8764522e52 ("staging/rdma/hfi1: Unexpected link up pkey values are not an error")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ingress and egress port P_Key checking should always be performed for
HFIs. This patch will enable ingress and egress P_Key checking when
the port is initialized and will ignore the P_Key information sent by
the FM in the port info structure which is meant to be used only by the
switch.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Neel Desai <neel.desai@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Secure data is transferred across the link during verify
cap. This includes Neighbor Guid, Type, and Port Number.
This transfer is not guaranteed to complete until the 8051
firmware has completed processing of the state_complete
frame. Move the consumption of this data from verify cap
handling to link up handling to ensure the data is finalized.
Additionally, do not notify the SM that the link is up until
after this data is actually available.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Stuart Summers <john.s.summers@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When we receive a QSFP_HIGH_TEMP_ALARM or QSFP_HIGH_TEMP_WARNING
interrupt, print a "QSFP cable temperature too high" message.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Neel Desai <neel.desai@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The driver progress routines can call cond_resched() when
a timeslice is exhausted and irqs are enabled.
If the ULP had been holding a spin lock without disabling irqs and
the post send directly called the progress routine, the cond_resched()
could yield allowing another thread from the same ULP to deadlock
on that same lock.
Correct by replacing the current hfi1_do_send() calldown with a unique
one for post send and adding an argument to hfi1_do_send() to indicate
that the send engine is running in a thread. If the routine is not
running in a thread, avoid calling cond_resched().
CC: <stable@vger.kernel.org> # 4.7.x-
Fixes: Commit 831464ce4b ("IB/hfi1: Don't call cond_resched in atomic mode when sending packets")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Move FECN and BECN related defines to common header files
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
These inline functions improve code readability by
enabling callers to read specific fields from the
header without knowledge of byte offsets.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The function really returned the 5-bit sc value from
the header and rhf. hdr2sc didn't quite describe what it did.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
VL15 in the SC2VL table is used to indicate an invalid SC
for the FM, however, internally the driver remaps SCs from
VL15 to ILLEGAL_VL to prevent error counts. This mapping
confuses the FM when performing a sweep, making it return
a table mismatch error. Have SMA convert ILLEGAL_VL
to VL15 entries for the SC2VL table queries.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Improve the safety of the code by validating the user supplied
tidcnt before use.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The Infiniband spec defines "A multicast address is defined by a
MGID and a MLID" (section 10.5).
The current code only uses the MGID for identifying multicast groups.
Update the driver to be compliant with this definition.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The Infiniband spec defines "A multicast address is defined by a
MGID and a MLID" (section 10.5). Currently the MLID value is not
validated.
Add check to verify that the MLID value is in the correct address
range.
Fixes: 0c33aeedb2 ("[IB] Add checks to multicast attach and detach")
Cc: stable@vger.kernel.org
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The FM uses the values of MulticastMask and CollectiveMask to
determine the number of bits for net masks. The current values of
0 and 0 are incorrect. The values should be 4 and 1. Updated the
necessary code to reflect the specified values.
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A list of MGID/MLID pairs is built when doing a multicast attach. When
the multicast detach is called, the list is searched, and regardless of
the search outcome, the driver detach is called.
If an MGID/MLID pair is not on the list, driver detach should not be
called, and an error should be returned. Calling the driver without
removing an MGID/MLID pair from the list can leave the core and driver
out of sync.
Fixes: f4e401562c ("IB/uverbs: track multicast group membership for userspace QPs")
Cc: stable@vger.kernel.org
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use setup_timer() instead of init_timer() to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use setup_timer() instead of init_timer() to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use setup_timer() instead of init_timer() to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix mismatch between types, wqe_words are in le32 format, while opcode
in CPU format.
The following sparse warnings are helped to find it:
drivers/infiniband/hw/nes/nes_hw.c:3058:24: warning: incorrect type in assignment (different base types)
drivers/infiniband/hw/nes/nes_hw.c:3058:24: expected unsigned int [unsigned] [assigned] [usertype] opcode
drivers/infiniband/hw/nes/nes_hw.c:3058:24: got restricted __le32 <noident>
CC: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Simplify code in find_free_vf_and_create_qp_grp() to avoid sparse error
regarding call to unlock in the block other than lock was called.
drivers/infiniband/hw/usnic/usnic_ib_verbs.c:206:9: warning: context imbalance
in 'find_free_vf_and_create_qp_grp' - different lock
contexts for basic block
CC: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sparse tool complains about undeclared symbols in usnic_ib_verbs.c
and usnic_ib_sysfs.c This is caused by lack of direct include of
appropriate usnic_ib_verbs.h and usnic_ib_sysfs.h, where all
these functions were declared.
Simple include eliminates 30 warnings similar to the below one:
drivers/infiniband/hw/usnic/usnic_ib_sysfs.c:304:6: warning: symbol
'usnic_ib_sysfs_unregister_usdev' was
not declared. Should it be static?
CC: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Functions declared in uverbs_std_types.c are local to that file, but
they lack static declarations. This produces a lot of sparse warnings,
like the one below:
drivers/infiniband/core/uverbs_std_types.c:41:5: warning: symbol
'uverbs_free_ah' was not declared.
Should it be static?
So mark them as static.
CC: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Function alloc_skb() will return a NULL pointer when there is no enough
memory. However, the return value of alloc_skb() is directly used
without validation in function send_fw_pass_open_req(). This patches
checks the return value of alloc_skb() against NULL.
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We need to call spin_unlock_irqrestore() instead of vanilla
spin_unlock() on this error path.
Fixes: 119a8e708d ("IB/rdmavt: Add AH to rdmavt")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The infiniband address handle can be triggered to resolve an ipv6
address in response to MAD packets, regardless of the ipv6
module being disabled via the kernel command line argument.
That will cause a call into the ipv6 routing code, which is not
initialized, and a conseguent oops.
This commit addresses the above issue replacing the direct lookup
call with an indirect one via the ipv6 stub, which is properly
initialized according to the ipv6 status (e.g. if ipv6 is
disabled, the routing lookup fails gracefully)
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Split the poll responder CQ into two functions.
Add support for send+invalidate in poll CQ.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Wait for all relevant CNQ interrupts before freeing the CQ.
Don't invoke completion handlers for a destroyed CQ.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
After checking the path upwards towards root complex, actualy check
root complex atomic_req capability, and not our own NIC.
Verify that the PCIe device control register's atomic egress block
is cleared in the path.
Verify that the PCIe version is at least 2.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
simple_fill_super() is passed an array of tree_descr structures which
describe the files to create in the filesystem's root directory. Since
these arrays are never modified intentionally, they should be 'const' so
that they are placed in .rodata and benefit from memory protection.
This patch updates the function signature and all users, and also
constifies tree_descr.name.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Internally MW implemented as KLM MKey and filled by userspace UMR
postsends. Handle pagefault trigered by operations on this MKeys.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
To make page fault handling code more flexible
split pagefault_single_data_segment() function.
Keep MR resolution in pagefault_single_data_segment() and
move actual updates into pagefault_single_mr().
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add IB_ACCESS_HUGETLB ib_reg_mr flag.
Hugetlb region registered with this flag
will use single translation entry per huge page.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currenlty ODP supports only regular MMU pages.
Add ODP support for regions consisting of physically contiguous chunks
of arbitrary order (huge pages for instance) to improve performance.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currenlty ODP supports only regular MMU pages.
Add ODP support for regions consisting of physically contiguous chunks
of arbitrary order (huge pages for instance) to improve performance.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When implicit MR's leaf MKey becomes unused, i.e. when it's
last page being released my MMU invalidation it is marked as "dying"
and scheduled for release by garbage collector.
Currentle consequent page fault may remove "dying" flag.
Treat leaf MKey as non-existent once it was scheduled to removal
by GC.
Fixes: 81713d3788 ('IB/mlx5: Add implicit MR support')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Translation table updates of large UMR may require multiple post send
operations. The last operations can be in various lengths, but current
code set them to be the same length.
Fixes: 7d0cc6edcc ('IB/mlx5: Add MR cache for large UMR regions')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In memory shortage path we fall back to use spare buffer.
mlx5_ib_update_xlt() called from ib_uverbs_reg_mr when ibmr.ucontext
not initialized yet.
Scenario how to test it:
1. trigger memory exhaustion so __get_free_pages(GFP_KERNEL, 4) will fail
2. register MR
3. there should be no kernel oops
Fixes: 7d0cc6edcc ('IB/mlx5: Add MR cache for large UMR regions')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Size of pages are held by struct ib_umem in page_size field.
It is better to store it as an exponent, because page size by nature
is always power-of-two and used as a factor, divisor or ilog2's argument.
The conversion of page_size to be page_shift allows to have portable
code and avoid following error while compiling on ARM:
ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!
CC: Selvin Xavier <selvin.xavier@broadcom.com>
CC: Steve Wise <swise@chelsio.com>
CC: Lijun Ou <oulijun@huawei.com>
CC: Shiraz Saleem <shiraz.saleem@intel.com>
CC: Adit Ranadive <aditr@vmware.com>
CC: Dennis Dalessandro <dennis.dalessandro@intel.com>
CC: Ram Amrani <Ram.Amrani@Cavium.com>
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The function ib_unregister_mad_agent always returns zero. And
this returned value is not checked. As such, chane the return
type to void.
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix off by 1 error in comments documenting the sdma and send context
mappings.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The 'num_sge' variable is verfied to be smaller than the 'sge_count'
variable; however, since both are user-controlled it's possible to cause
an integer overflow for the kmalloc multiply on 32-bit platforms
(num_sge and sge_count are both defined u32). By crafting an input that
causes a smaller-than-expected allocation it's possible to write
controlled data out-of-bounds.
Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
hns_roce_v1_cq_set_ci() calls roce_set_bit() on an uninitialized field,
which will then change only a few of its bits, causing a warning with
the latest gcc:
infiniband/hw/hns/hns_roce_hw_v1.c: In function 'hns_roce_v1_cq_set_ci':
infiniband/hw/hns/hns_roce_hw_v1.c:1854:23: error: 'doorbell[1]' is used uninitialized in this function [-Werror=uninitialized]
roce_set_bit(doorbell[1], ROCEE_DB_OTHERS_H_ROCEE_DB_OTH_HW_SYNS_S, 1);
The code is actually correct since we always set all bits of the
port_vlan field, but gcc correctly points out that the first
access does contain uninitialized data.
This initializes the field to zero first before setting the
individual bits.
Fixes: 9a4435375c ("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.
The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.
The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.
This patch converts the frm_pool kthread into the kthread worker
API because I am not sure how busy the thread is. It is well
possible that it does not need a dedicated kthread and workqueues
would be perfectly fine. Well, the conversion between kthread
worker API and workqueues is pretty trivial.
The patch moves one iteration from the kthread into the work function.
It is queued only when there is a pending work. Therefore we do not
need to compare flush_ser and req_ser at the beginning. On the contrary,
the same work could be queued only once at a time. Therefore it has to
re-queue itself if some requests are pending.
Otherwise, wake_up_process() is replaced by queuing the work.
Important: The change is only compile tested. I did not find an easy
way how to check it in a real life.
Signed-off-by: Petr Mladek <pmladek@suse.com>
TO: Doug Ledford <dledford@redhat.com>
CC: Sean Hefty <sean.hefty@intel.com>
CC: Hal Rosenstock <hal.rosenstock@gmail.com>
CC: linux-rdma@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
This logic seems to be duplicated in (at least) three separate files.
Move it to one place so code can be re-use.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
trivial fix to spelling mistake in iser_err error message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enable the use of dsgl by default and determine whether dsgl is
supported from lld info.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Bharat Potnuri <bharat@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Constructs such as if (ptr && !IS_ERR(ptr)) can be shorted to
just !IS_ERR_OR_NULL(ptr) instead. Make substitutions in the bnxt_re
driver where appropriate.
Signed-off-by: Doug Ledford <dledford@redhat.com>
rc is initialized to zero but is then updated by calls to
bnxt_qplib_free_fast_reg_page_list and/or bnxt_qpliob_free_mrw
so the initialization is redundant and can be removed.
Detected with CoverityScan, CID#1408448 ("Unused Value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add missing calculation and translation of active_width and
active_speed for RoCE.
Fixes: 3f89a643eb ('IB/mlx5: Extend query_device/port to ...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In case of an error, the properties reported to user
are zeroed out, so no need for a return value.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add high data rate speed to the ib_port_speed enumeration.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is a difference when parsing a completion entry between Ethernet
and IB ports. When link layer is Ethernet the bits describe the type of
L3 header in the packet. In the case when link layer is Ethernet and VLAN
header is present the value of SL is equal to the 3 UP bits in the VLAN
header. If VLAN header is not present then the SL is undefined and consumer
of the completion should check if IB_WC_WITH_VLAN is set.
While that, this patch also fills the vlan_id field in the completion if
present.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Current implementation of RDMA_CM sends MRA (Message Receipt
Acknowledgment) only for request messages but not for response messages.
As a result, a slow active side of the connection may send a ready-to-use
message to the passive side in a delay that is too long for the passive
side to wait for.
This patch adds a call to ib_send_cm_mra() upon receiving a response
message and by this tells the other side to modify the service timeout
to a bigger value, 16 times than before. As in the request case, MRA
for reply will be sent only if a duplicate response has arrived.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Matan Barak <matan@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch adds support to query the congestion related hardware counters
through new command and links them with other hw counters being available
in hw_counters sysfs location.
In order to reuse existing infrastructure it renames related q_counter
data structures to more generic counters to reflect q_counters and
congestion counters and maybe some other counters in the future.
New hardware counters:
* rp_cnp_handled - CNP packets handled by the reaction point
* rp_cnp_ignored - CNP packets ignored by the reaction point
* np_cnp_sent - CNP packets sent by notification point to respond to
CE marked RoCE packets
* np_ecn_marked_roce_packets - CE marked RoCE packets received by
notification point
It also avoids returning ENOSYS which is specific for invalid
system call and produces the following checkpatch.pl warning.
WARNING: ENOSYS means 'invalid syscall nr' and nothing else
+ return -ENOSYS;
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The mthca driver didn't check supplied pointer to functions
mthca_cmd_poll() and mthca_cmd_wait(). This caused to the following
smatch errors:
drivers/infiniband/hw/mthca/mthca_cmd.c:371 mthca_cmd_poll() error: we previously assumed 'out_param' could be null (see line 353)
drivers/infiniband/hw/mthca/mthca_cmd.c:454 mthca_cmd_wait() error: we previously assumed 'out_param' could be null (see line 432)
In reality all callers of these functions are setting out_is_imm
flag are providing pointer too. However it is better to check
again to remove smatch errors to achieve warning free subsystem.
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A drop rule is described by an action drop and no destination.
If a user specified IB_FLOW_SPEC_ACTION_DROP then set the action
to MLX5_FLOW_CONTEXT_ACTION_DROP and clear the destination.
Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This flow steering specification identifies flow for drop by the HW.
If user create a flow only with the drop specification,
then all the packets that hit this flow will be dropped, otherwise the HW
will drop only the packets that match the other L2/L3/L4 specifications.
Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This change adds the ability for flow steering to classify IPv4/6
packets with MPLS tag (Ethertype 0x8847 and 0x8848) as standard IP
packets and hit IPv4/6 classifed steering rules.
When user added a flow rule with IP classification, driver was
implicitly adding ethertype matching to the created rule in order
to distinguish between IPv4 and IPv6 protocols.
Since IP packets with MPLS tag header have MPLS ethertype, they missed
the rule and ended up hitting the default filters.
Such behavior prevented from MPLS packets to undergo inbound traffic
load balancing flows (if such were defined by configuring RSS) to
achieve higher throughput - the way that non-MPLS IP packets performed.
Since our device is able to look past the MPLS tag and identify the
next protocol we introduce this solution which replaces Ethertype
matching by the device's capability to perform IP version parsing
and matching in order to distinguish between IPv4 and IPv6.
Therefore, whenever a flow with IP spec is added and device support IP
version matching, driver will implicitly add IP version matching to the
rule (Based on the IP spec type) without Ethertype matching which will
cause relevant MPLS tagged packets to hit this rule as well.
Otherwise (device doesn't support IP version matching), we fall back to
setting Ethertype matching.
If the user's filters specify an L2 ethertype and an IP spec
the rule will then match both the ethertype and the IP version.
The device's support for IP version matching is reported by the
device via dedicated capability bit in query_device_cap and named
outer/inner_ip_version.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This change fixes an incomplete validation of the user's
flow attributes list.
Previous implementation validated only matching of IPv4 Ethertype
to IPv4 spec of outer headers (in case both Ethernet with specified
Ethertype and IP specs were present) and lacked the validation of:
1. Matching of IPv6 Ethertype in Ethernet spec (if such exists) to an
IPv6 protocol spec (if such exists).
2. Validation of Ethertype to IP protocol matching on inner headers specs.
Which could cause some combinations of unmatching Ethernet and IP
protocols to pass validation and apply on the device.
The fix adds validation of IPv6 Ethertype and IP spec as well as
performing the scan on both outer and inner attributes.
Fixes: 038d2ef875 ("Add flow steering support")
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The kfree was called to free cqb, while it should free *cqb.
Fixes: 1cbe6fc86c ("IB/mlx5: Add support for CQE compressing")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to enlarge the flow group size to 8k, we decrease
the number of flow group types to 6 and increase the flow
table size to 64k.
Flow group size is calculated as follow:
group_size = table_size / (#group_types + 1)
Fixes: 038d2ef875 ('IB/mlx5: Add flow steering support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Check that the required flow table size is supported
by device. Return ENOMEM error if no space left.
In addition change the create flow table routine
to return ENOMEM instead of ENOSPC.
Fixes: 038d2ef875 ('IB/mlx5: Add flow steering support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Anonymous VMA (->vm_ops == NULL) cannot be shared, otherwise
it would lead to SIGBUS.
Remove the shared flags from the vma after we change it to be
anonymous.
This is easily reproduced by doing modprobe -r while running a
user-space application such as raw_ethernet_bw.
Fixes: 7c2344c3bb ('IB/mlx5: Implements disassociate_ucontext API')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When the driver disassociate user context, it changes the vma to
anonymous by setting the vm_ops to null and zap the vma ptes.
In order to avoid race in the kernel, we need to take write lock
before we change the vma entries.
Fixes: 7c2344c3bb ('IB/mlx5: Implements disassociate_ucontext API')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Anonymous VMA (->vm_ops == NULL) cannot be shared, otherwise
it would lead to SIGBUS.
Remove the shared flags from the vma after we change it to be
anonymous.
This is easily reproduced by doing modprobe -r while running a
user-space application such as raw_ethernet_bw.
Fixes: ae184ddeca ('IB/mlx4_ib: Disassociate support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When the driver disassociate user context, it changes the vma to
anonymous by setting the vm_ops to null and zap the vma ptes.
In order to avoid race in the kernel, we need to take write lock
before we change the vma entries.
Fixes: ae184ddeca ('IB/mlx4_ib: Disassociate support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A warning message during SRIOV multicast cleanup should have actually been
a debug level message. The condition generating the warning does no harm
and can fill the message log.
In some cases, during testing, some tests were so intense as to swamp the
message log with these warning messages, causing a stall in the console
message log output task. This stall caused an NMI to be sent to all CPUs
(so that they all dumped their stacks into the message log).
Aside from the message flood causing an NMI, the tests all passed.
Once the message flood which caused the NMI is removed (by reducing the
warning message to debug level), the NMI no longer occurs.
Sample message log (console log) output illustrating the flood and
resultant NMI (snippets with comments and modified with ... instead
of hex digits, to satisfy checkpatch.pl):
<mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
*** About 4000 almost identical lines in less than one second ***
<mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (...)
*** { 17} above indicates that CPU 17 was the one that stalled ***
sending NMI to all CPUs:
...
NMI backtrace for cpu 17
CPU: 17 PID: 45909 Comm: kworker/17:2
Hardware name: HP ProLiant DL360p Gen8, BIOS P71 09/08/2013
Workqueue: events fb_flashcursor
task: ffff880478...... ti: ffff88064e...... task.ti: ffff88064e......
RIP: 0010:[ffffffff81......] [ffffffff81......] io_serial_in+0x15/0x20
RSP: 0018:ffff88064e257cb0 EFLAGS: 00000002
RAX: 0000000000...... RBX: ffffffff81...... RCX: 0000000000......
RDX: 0000000000...... RSI: 0000000000...... RDI: ffffffff81......
RBP: ffff88064e...... R08: ffffffff81...... R09: 0000000000......
R10: 0000000000...... R11: ffff88064e...... R12: 0000000000......
R13: 0000000000...... R14: ffffffff81...... R15: 0000000000......
FS: 0000000000......(0000) GS:ffff8804af......(0000) knlGS:000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080......
CR2: 00007f2a2f...... CR3: 0000000001...... CR4: 0000000000......
DR0: 0000000000...... DR1: 0000000000...... DR2: 0000000000......
DR3: 0000000000...... DR6: 00000000ff...... DR7: 0000000000......
Stack:
ffff88064e...... ffffffff81...... ffffffff81...... 0000000000......
ffffffff81...... ffff88064e...... ffffffff81...... ffffffff81......
ffffffff81...... ffff88064e...... ffffffff81...... 0000000000......
Call Trace:
[<ffffffff813d099b>] wait_for_xmitr+0x3b/0xa0
[<ffffffff813d0b5c>] serial8250_console_putchar+0x1c/0x30
[<ffffffff813d0b40>] ? serial8250_console_write+0x140/0x140
[<ffffffff813cb5fa>] uart_console_write+0x3a/0x80
[<ffffffff813d0aae>] serial8250_console_write+0xae/0x140
[<ffffffff8107c4d1>] call_console_drivers.constprop.15+0x91/0xf0
[<ffffffff8107d6cf>] console_unlock+0x3bf/0x400
[<ffffffff813503cd>] fb_flashcursor+0x5d/0x140
[<ffffffff81355c30>] ? bit_clear+0x120/0x120
[<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[<ffffffff810a5aef>] kthread+0xcf/0xe0
[<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[<ffffffff81645858>] ret_from_fork+0x58/0x90
[<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 10 8b 06 5d c3 66 0f 1f 44 00 00 66 66 66 6
As indicated in the stack trace above, the console output task got swamped.
Fixes: b9c5d6a643 ("IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV")
Cc: <stable@vger.kernel.org> # v3.6+
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In mlx4_ib_add, procedure mlx4_ib_alloc_eqs is called to allocate EQs.
However, in the mlx4_ib_add error flow, procedure mlx4_ib_free_eqs is not
called to free the allocated EQs.
Fixes: e605b743f3 ("IB/mlx4: Increase the number of vectors (EQs) available for ULPs")
Cc: <stable@vger.kernel.org> # v3.4+
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
On some environments, such as certain SR-IOV VF configurations, RoCE
isn't supported for mlx4 Ethernet ports. Currently the driver will
not open IB device on that port.
This is problematic since we do want user-space RAW Ethernet QPs functionality
to remain in place. For that end, enhance the relevant driver flows such that we
do create a device instance in that case.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The kernel commit cited below restructured ib device management
so that the device kobject is initialized in ib_alloc_device.
As part of the restructuring, the kobject is now initialized in
procedure ib_alloc_device, and is later added to the device hierarchy
in the ib_register_device call stack, in procedure
ib_device_register_sysfs (which calls device_add).
However, in the ib_device_register_sysfs error flow, if an error
occurs following the call to device_add, the cleanup procedure
device_unregister is called. This call results in the device object
being deleted -- which results in various use-after-free crashes.
The correct cleanup call is device_del -- which undoes device_add
without deleting the device object.
The device object will then (correctly) be deleted in the
ib_register_device caller's error cleanup flow, when the caller invokes
ib_dealloc_device.
Fixes: 55aeed0654 ("IB/core: Make ib_alloc_device init the kobject")
Cc: <stable@vger.kernel.org> # v4.2+
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Before calling ipoib_stop, rtnl_lock should be taken, then
the flow clears the IPOIB_FLAG_ADMIN_UP and IPOIB_FLAG_OPER_UP
flags, and waits for mcast completion if IPOIB_MCAST_FLAG_BUSY
is set.
On the other hand, the flow of multicast join task initializes
a mcast completion, sets the IPOIB_MCAST_FLAG_BUSY and calls
ipoib_mcast_join. If IPOIB_FLAG_OPER_UP flag is not set, this
call returns EINVAL without setting the mcast completion and
leads to a deadlock.
ipoib_stop |
| |
clear_bit(IPOIB_FLAG_ADMIN_UP) |
| |
Context Switch |
| ipoib_mcast_join_task
| |
| spin_lock_irq(lock)
| |
| init_completion(mcast)
| |
| set_bit(IPOIB_MCAST_FLAG_BUSY)
| |
| Context Switch
| |
clear_bit(IPOIB_FLAG_OPER_UP) |
| |
spin_lock_irqsave(lock) |
| |
Context Switch |
| ipoib_mcast_join
| return (-EINVAL)
| |
| spin_unlock_irq(lock)
| |
| Context Switch
| |
ipoib_mcast_dev_flush |
wait_for_completion(mcast) |
ipoib_stop will wait for mcast completion for ever, and will
not release the rtnl_lock. As a result panic occurs with the
following trace:
[13441.639268] Call Trace:
[13441.640150] [<ffffffff8168b579>] schedule+0x29/0x70
[13441.641038] [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0
[13441.641914] [<ffffffff810bc017>] ? complete+0x47/0x50
[13441.642765] [<ffffffff810a690d>] ? flush_workqueue_prep_pwqs+0x16d/0x200
[13441.643580] [<ffffffff8168b956>] wait_for_completion+0x116/0x170
[13441.644434] [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20
[13441.645293] [<ffffffffa05af170>] ipoib_mcast_dev_flush+0x150/0x190 [ib_ipoib]
[13441.646159] [<ffffffffa05ac967>] ipoib_ib_dev_down+0x37/0x60 [ib_ipoib]
[13441.647013] [<ffffffffa05a4805>] ipoib_stop+0x75/0x150 [ib_ipoib]
Fixes: 08bc327629 ("IB/ipoib: fix for rare multicast join race condition")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Update the broadcast address in the priv->broadcast object when the
Pkey value changes in index 0, otherwise the multicast GID value will
keep the previous value of the PKey, and will not be updated.
This leads to interface state down because the interface will keep the
old PKey value.
For example, in SR-IOV environment, if the PF changes the value of PKey
index 0 for one of the VFs, then the VF receives PKey change event that
triggers heavy flush. This flush calls update_parent_pkey that update the
broadcast object and its relevant members. If in this case the multicast
GID will not be updated, the interface state will be down.
Fixes: c290414169 ("IPoIB: Fix pkey change flow for virtualization environments")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In RC QP there is no need to resolve the outgoing interface
for each packet, as this does not change during QP life cycle.
Instead cache the interface on the socket and use that one.
This improves performance by 12% by sparing redundant
calls to rxe_find_route.
ib_send_bw -d rxe0 -x 1 -n 9000 -e -s $((1024 * 1024 )) -l 100
----------------------------------------------------------------------------------------
| | bytes | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] |
----------------------------------------------------------------------------------------
| before | 1048576 | 9000 | inf | 551.21 | 0.000551 |
| after | 1048576 | 9000 | inf | 615.54 | 0.000616 |
----------------------------------------------------------------------------------------
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Function rxe_rcv is used internally in RXE and don't need to be
exported. This patch removes such export declaration.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch avoids RNR NAK timer and retransmit timer initialization and
cleanup for non RC QPs (such as UD QP, GSI QP).
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Expose new counters using the get_hw_stats callback.
We expose the following counters:
+---------------------+----------------------------------------+
| Name | Description |
|---------------------+----------------------------------------|
|sent_pkts | number of sent pkts |
|---------------------+----------------------------------------|
|rcvd_pkts | number of received packets |
|---------------------+----------------------------------------|
|out_of_sequence | number of errors due to packet |
| | transport sequence number |
|---------------------+----------------------------------------|
|duplicate_request | number of received duplicated packets. |
| | A request that previously executed is |
| | named duplicated. |
|---------------------+----------------------------------------|
|rcvd_rnr_err | number of received RNR by completer |
|---------------------+----------------------------------------|
|send_rnr_err | number of sent RNR by responder |
|---------------------+----------------------------------------|
|rcvd_seq_err | number of out of sequence packets |
| | received |
|---------------------+----------------------------------------|
|ack_deffered | number of deferred handling of ack |
| | packets. |
|---------------------+----------------------------------------|
|retry_exceeded_err | number of times retry exceeded |
|---------------------+----------------------------------------|
|completer_retry_err | number of times completer decided to |
| | retry |
|---------------------+----------------------------------------|
|send_err | number of failed send packet |
+---------------------+----------------------------------------+
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A couple spots were missed in the original patch to implement this
change. Add those spots.
Fixes: a9a42886d0 (cxgb4: Convert PDBG to pr_debug)
Signed-off-by: Doug Ledford <dledford@redhat.com>
* Multiplications for the size determination of memory allocations
indicated that array data structures should be processed.
Thus use the corresponding function "kcalloc".
This issue was detected by using the Coccinelle software.
* Replace the specification of data types by pointer dereferences
to make the corresponding size determinations a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data structure by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Replace the specification of a data structure by a reference to
the desired member as the parameter for the operator "sizeof" to make
the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
* Pass a product for a call of the function "vmalloc_user" without storing
it in an intermediate variable.
* Delete the local variable "memsize" which became unnecessary with
this refactoring.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
* Multiplications for the size determination of memory allocations
indicated that array data structures should be processed.
Thus reuse the corresponding function "kcalloc".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data type by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus reuse the corresponding function "kcalloc".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data type by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use a more typical logging style.
Miscellanea:
o Obsolete the c4iw_debug module parameter
o Coalesce formats
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Convert printks to pr_<level>
Miscellanea:
o Coalesce formats
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Using the normal mechanism, not an indirected one, is clearer.
Miscellanea:
o Coalesce formats
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Convert printks to pr_<level>
Miscellanea:
o Coalesce formats
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
IPoIB driver now uses the new set of callback functions.
If the hardware provider supports the new ipoib_options implementation,
the driver uses the callbacks in its data path flows, otherwise it uses the
driver default implementation for all data flows in its code.
The default implementation wasn't change and it is exactly as it was before
introduction of acceleration support.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Make ipoib_priv point to netdev_priv where the code calls netdev_priv.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>