Commit Graph

737244 Commits

Author SHA1 Message Date
Doug Ledford
2ea32cd6c7 mlx5-updates-2018-02-28-2 (IPSec-2)
This series follows our previous one to lay out the foundations for IPSec
 in user-space and extend current kernel netdev IPSec support. As noted in
 our previous pull request cover letter "mlx5-updates-2018-02-28-1 (IPSec-1)",
 the IPSec mechanism will be supported through our flow steering mechanism.
 Therefore, we need to change the initialization order. Furthermore, IPsec
 is also supported in both egress and ingress. Since our current flow
 steering is egress only, we add an empty (only implemented through FPGA
 steering ops) egress namespace to handle that case. We also implement
 the required flow steering callbacks and logic in our FPGA driver.
 
 We extend the FPGA support for ESN and modifying a xfrm too. Therefore, we
 add support for some new FPGA command interface that supports them. The
 other required bits are added too. The new features and requirements are
 advertised via cap bits.
 
 Last but not least, we revise our driver's accel_esp API. This API will be
 shared between our netdev and IB driver, so we need to have all the required
 functionality from both worlds.
 
 Regards,
 Aviad and Matan
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaoH8zAAoJEEg/ir3gV/o+h00H/RyM1xoGCzJtvQAYEhNcEfvY
 YJfaJSPvuuvS2Fvs8meUzjqQvKcmkMjmViD3Ujuzyh6Y36IcoPWlBojRDE2fpz2b
 yRaK5CotcLpfDXchlLnH5ZZbOgO374866viCVoM4i2ls19Ml730piDs8CDcA6+T7
 3W4vvr977xl9bFqDMMbeldijZ3+H36Exnq6Xj+o2j6Sc1/om9Mvgw7XJhcpiBTW5
 ZFfA7djz7TdSyBJDQsLteL/wLbLsLeqXmKCKX9BsqRo+rpoUWmskKFNC1Dj0bzX3
 XbrdR8GoHVMS1PZAPJIbc1bubhqBznxrbh/g95PInSkGZzKKWw1dhS/3fkkAndM=
 =Fodq
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-02-28-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into k.o/wip/dl-for-next

mlx5-updates-2018-02-28-2 (IPSec-2)

This series follows our previous one to lay out the foundations for IPSec
in user-space and extend current kernel netdev IPSec support. As noted in
our previous pull request cover letter "mlx5-updates-2018-02-28-1 (IPSec-1)",
the IPSec mechanism will be supported through our flow steering mechanism.
Therefore, we need to change the initialization order. Furthermore, IPsec
is also supported in both egress and ingress. Since our current flow
steering is egress only, we add an empty (only implemented through FPGA
steering ops) egress namespace to handle that case. We also implement
the required flow steering callbacks and logic in our FPGA driver.

We extend the FPGA support for ESN and modifying a xfrm too. Therefore, we
add support for some new FPGA command interface that supports them. The
other required bits are added too. The new features and requirements are
advertised via cap bits.

Last but not least, we revise our driver's accel_esp API. This API will be
shared between our netdev and IB driver, so we need to have all the required
functionality from both worlds.

Regards,
Aviad and Matan

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 15:49:34 -04:00
Steve Wise
29cf1351d4 RDMA/nldev: provide detailed PD information
Implement the RDMA nldev netlink interface for dumping detailed PD
information.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
5292443431 mlx4_ib: zero out struct ib_pd when allocating
Zero out the fields of the struct ib_pd for user mode pds so that
users querying pds via nldev will not get garbage.  For simplicity,
use kzalloc() to allocate the mlx4_ib_pd struct.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
fccec5b89a RDMA/nldev: provide detailed MR information
Implement the RDMA nldev netlink interface for dumping detailed
MR information.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
e6f0330106 mlx4_ib: set user mr attributes in struct ib_mr
Setting iova, length, and page_size allows this information to be
seen via NLDEV netlink queries, which can aid in user rdma debugging.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
750fb1656a iw_cxgb4: initialize ib_mr fields for user mrs
Some of the struct ib_mr fields weren't getting initialized.  This was
benign, but will cause problems when dumping the mr resource via
nldev/restrack.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
a34fc0893e RDMA/nldev: provide detailed CQ information
Implement the RDMA nldev netlink interface for dumping detailed
CQ information.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
00313983cd RDMA/nldev: provide detailed CM_ID information
Implement RDMA nldev netlink interface to get detailed CM_ID information.

Because cm_id's are attached to rdma devices in various work queue
contexts, the pid and task information at restrak_add() time is sometimes
not useful.  For example, an nvme/f host connection cm_id ends up being
bound to a device in a work queue context and the resulting pid at attach
time no longer exists after connection setup.  So instead we mark all
cm_id's created via the rdma_ucm as "user", and all others as "kernel".
This required tweaking the restrack code a little.  It also required
wrapping some rdma_cm functions to allow passing the module name string.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
a3b641af72 RDMA/CM: move rdma_id_private to cma_priv.h
Move struct rdma_id_private to a new header cma_priv.h so the resource
tracking services in core/nldev.c can read useful information about cm_ids.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
d12ff62482 RDMA/nldev: common resource dumpit function
Create a common dumpit function that can be used by all common resource
types.  This reduces code replication and simplifies the code as we add
more resource types.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise
88831a2cfe RDMA/restrack: clean up res_to_dev()
Simplify res_to_dev() to make it easier to read/maintain.

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Leon Romanovsky
31135eb388 net/mlx5: Fix wrongly assigned CQ reference counter
The kernel compiled with CONFIG_REFCOUNT_FULL produces the following
error. The reason to it that initial value of refcount_t is supposed
to be more than 0, change it.

[    3.106634] ------------[ cut here ]------------
[    3.107756] refcount_t: increment on 0; use-after-free.
[    3.109130] WARNING: CPU: 0 PID: 1 at lib/refcount.c:153 refcount_inc+0x27/0x30
[    3.110085] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00028-gf683e04bdccc #137
[    3.110085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    3.110085] RIP: 0010:refcount_inc+0x27/0x30
[    3.110085] RSP: 0000:ffffaa620000fba0 EFLAGS: 00010286
[    3.110085] RAX: 0000000000000000 RBX: ffff9a6d1a1821c8 RCX: ffffffff98a50f48
[    3.110085] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 0000000000000246
[    3.110085] RBP: ffff9a6d1ac800a0 R08: 0000000000000289 R09: 000000000000000a
[    3.110085] R10: fffff03bc0682840 R11: ffffffff9949856d R12: ffff9a6d1b4a4000
[    3.110085] R13: 0000000000000000 R14: ffff9a6d1a0a6c00 R15: ffffaa620000fc5c
[    3.110085] FS:  0000000000000000(0000) GS:ffff9a6d1fc00000(0000) knlGS:0000000000000000
[    3.110085] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.110085] CR2: 0000000000000000 CR3: 000000000ba0a000 CR4: 00000000000006b0
[    3.110085] Call Trace:
[    3.110085]  mlx5_core_create_cq+0xde/0x250
[    3.110085]  ? __kmalloc+0x1ce/0x1e0
[    3.110085]  mlx5e_create_cq+0x15c/0x1e0
[    3.110085]  mlx5e_open_drop_rq+0xea/0x190
[    3.110085]  mlx5e_attach_netdev+0x53/0x140
[    3.110085]  mlx5e_attach+0x3d/0x60
[    3.110085]  mlx5e_add+0x11d/0x2f0
[    3.110085]  mlx5_add_device+0x77/0x170
[    3.110085]  mlx5_register_interface+0x74/0xc0
[    3.110085]  ? set_debug_rodata+0x11/0x11
[    3.110085]  init+0x67/0x72
[    3.110085]  ? mlx4_en_init_ptys2ethtool_map+0x346/0x346
[    3.110085]  do_one_initcall+0x98/0x147
[    3.110085]  ? set_debug_rodata+0x11/0x11
[    3.110085]  kernel_init_freeable+0x164/0x1e0
[    3.110085]  ? rest_init+0xb0/0xb0
[    3.110085]  kernel_init+0xa/0x100
[    3.110085]  ret_from_fork+0x35/0x40
[    3.110085] Code: 00 00 00 00 e8 ab ff ff ff 84 c0 74 02 f3 c3 80 3d 3b c3 64 01 00 75 f5 48 c7 c7 68 0b 81 98 c6 05 2b c3 64 01 01 e8 79 d7 a3 ff <0f> ff c3 66 0f 1f 44 00 00 8b 06 83 f8 ff 74 39 31 c9 39 f8 89
[    3.110085] ---[ end trace a0068e1c68438a74 ]---

Fixes: f105b45bf7 ("net/mlx5: CQ hold/put API")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07 15:54:36 -08:00
Aviad Yehezkel
cb01008390 net/mlx5: IPSec, Add support for ESN
Currently ESN is not supported with IPSec device offload.

This patch adds ESN support to IPsec device offload.
Implementing new xfrm device operation to synchronize offloading device
ESN with xfrm received SN. New QP command to update SA state at the
following:

           ESN 1                    ESN 2                  ESN 3
|-----------*-----------|-----------*-----------|-----------*
^           ^           ^           ^           ^           ^

^ - marks where QP command invoked to update the SA ESN state
    machine.
| - marks the start of the ESN scope (0-2^32-1). At this point move SA
    ESN overlap bit to zero and increment ESN.
* - marks the middle of the ESN scope (2^31). At this point move SA
    ESN overlap bit to one.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Yossef Efraim <yossefe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07 15:54:36 -08:00
Aviad Yehezkel
75ef3f5515 net/mlx5e: Added common function for to_ipsec_sa_entry
New function for getting driver internal sa entry from xfrm state.
All checks are done in one function.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07 15:54:35 -08:00
Aviad Yehezkel
05564d0ae0 net/mlx5: Add flow-steering commands for FPGA IPSec implementation
In order to add a context to the FPGA, we need to get both the software
transform context (which includes the keys, etc) and the
source/destination IPs (which are included in the steering
rule). Therefore, we register new set of firmware like commands for
the FPGA. Each time a rule is added, the steering core infrastructure
calls the FPGA command layer. If the rule is intended for the FPGA,
it combines the IPs information with the software transformation
context and creates the respective hardware transform.
Afterwards, it calls the standard steering command layer.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07 15:54:35 -08:00
Aviad Yehezkel
d6c4f0298c net/mlx5: Refactor accel IPSec code
The current code has one layer that executed FPGA commands and
the Ethernet part directly used this code. Since downstream patches
introduces support for IPSec in mlx5_ib, we need to provide some
abstractions. This patch refactors the accel code into one layer
that creates a software IPSec transformation and another one which
creates the actual hardware context.
The internal command implementation is now hidden in the FPGA
core layer. The code also adds the ability to share FPGA hardware
contexts. If two contexts are the same, only a reference count
is taken.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07 15:54:34 -08:00
Aviad Yehezkel
af9fe19d66 net/mlx5: Added required metadata capability for ipsec
Currently our device requires additional metadata in packet
to perform ipsec crypto offload.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07 15:54:34 -08:00
Aviad Yehezkel
1d2005e204 net/mlx5: Export ipsec capabilities
We will need that for ipsec verbs.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07 15:54:30 -08:00
Aviad Yehezkel
65802f4800 net/mlx5: IPSec, Add command V2 support
This patch adds V2 command support.
New fpga devices support extended features (udp encap, esn etc...), this
features require new hardware sadb format therefore we have a new version
of commands to manipulate it.

Signed-off-by: Yossef Efraim <yossefe@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07 15:53:18 -08:00
Yossi Kuperman
788a821076 net/mlx5e: IPSec, Add support for ESP trailer removal by hardware
Current hardware decrypts and authenticates incoming ESP packets.
Subsequently, the software extracts the nexthdr field, truncates the
trailer and adjusts csum accordingly.

With this patch and a capable device, the trailer is being removed
by the hardware and the nexthdr field is conveyed via PET. This way
we avoid both the need to access the trailer (cache miss) and to
compute its relative checksum, which significantly improve
the performance.

Experiment shows that trailer removal improves the performance by
2Gbps, (netperf). Both forwarding and host-to-host configurations.

Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07 15:53:18 -08:00
Yossi Kuperman
581fdddee4 net/mlx5: IPSec, Generalize sandbox QP commands
The current code assume only SA QP commands.
Refactor in order to pave the way for new QP commands:
1. Generic cmd response format.
2. SA cmd checks are in dedicated functions.
3. Aligned debug prints.

Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-07 15:53:17 -08:00
Saeed Mahameed
d83a69c2b1 net/mlx5: Use MLX5_IPSEC_DEV macro for ipsec caps
Fix build break of mlx5_accel_ipsec_device_caps is not defined when
MLX5_ACCEL is not selected, use MLX5_IPSEC_DEV instead which handles
such case.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reported-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:49:19 -08:00
Yishai Hadas
d50a8a96ee IB/mlx4: Move mlx4_uverbs_ex_query_device_resp to include/uapi/
This struct is involved in the user API for mlx4 and should not be hidden
inside a driver header file.

Fixes: 09d208b258 ("IB/mlx4: Add report for RSS capabilities by vendor channel")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-07 16:10:07 -07:00
Doug Ledford
1abb791fcd Merge tag 'mlx5-updates-2018-02-28-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into k.o/wip/dl-for-next
mlx5-updates-2018-02-28-1 (IPSec-1)

This series consists of some fixes and refactors for the mlx5 drivers,
especially around the FPGA and flow steering. Most of them are trivial
fixes and are the foundation of allowing IPSec acceleration from user-space.

We use flow steering abstraction in order to accelerate IPSec packets.
When a user creates a steering rule, [s]he states that we'll carry an
encrypt/decrypt flow action (using a specific configuration) for every
packet which conforms to a certain match. Since currently offloading these
packets is done via FPGA, we'll add another set of flow steering ops.
These ops will execute the required FPGA commands and then call the
standard steering ops.

In order to achieve this, we need that the commands will get all the
required information. Therefore, we pass the fte object and embed the
flow_action struct inside the fte. In addition, we add the shim layer
that will later be used for alternating between the standard and the
FPGA steering commands.

Some fixes, like " net/mlx5e: Wait for FPGA command responses with a timeout"
are very relevant for user-space applications, as these applications could
be killed, but we still want to wait for the FPGA and update the kernel's
database.

Regards,
Aviad and Matan

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:56:39 -07:00
Zhu Yanjun
befd8d98f2 IB/rxe: change the function rxe_init_device_param type
The function rxe_init_device_param always return 0. So the function
type is changed to void.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:56:15 -07:00
Zhu Yanjun
31f1bd14cb IB/rxe: remove unnecessary rxe in rxe_send
In the function rxe_send, the variable rxe is not used in it.
So it should be removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:56:14 -07:00
Zhu Yanjun
86af617641 IB/rxe: remove unnecessary skb_clone
In send_atomic_ack function, it is not necessary to make a
skb_clone. To gain better performance (high throughput and
low latency), this skb_clone is removed.

The following tests are made.

 server                       client
---------                    ---------
|1.1.1.1|<----rxe-channel--->|1.1.1.2|
---------                    ---------

On server: rping -s -a 1.1.1.1 -v -C 1000 -S 512
On client: rping -c -a 1.1.1.1 -v -C 1000 -S 512

The kernel config CONFIG_DEBUG_KMEMLEAK is enabled on both server
and client.

This test runs for several hours. There is no memory leak and the whole
system can work well.

Based on the above network, the following tests are made.

Server: ibv_rc_pingpong -d rxe0 -g 1
Client: ibv_rc_pingpong -d rxe0 -g 1 1.1.1.1

The test results on Server(10 tests are made).
Before:
Throughput is 137.07 Mbit/sec
Latency is 517.76 usec/iter

After:
Throughput is 148.85 Mbit/sec
Latency is 476.64 usec/iter

The throughput is enhanced and the latency is reduced.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:56:14 -07:00
Bart Van Assche
63cf1a902c IB/srpt: Add RDMA/CM support
Add a parameter for configuring the port on which the ib_srpt driver
listens for incoming RDMA/CM connections, namely
/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port. The default
value for this parameter is 0 which means "do not listen for incoming
RDMA/CM connections". Add RDMA/CM support to all code that handles
connection state changes. Modify srpt_init_nodeacl() such that ACLs can
be configured for IPv4 and IPv6 addresses.

Note: incoming connection requests are only accepted for ports that
have been enabled. See also the "if (!sport->enabled)" code in the
connection request handler. See also the following configfs attribute:
/sys/kernel/config/target/srpt/$port/$port/enable.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:56:14 -07:00
Aviad Yehezkel
e810bf5e96 net/mlx5: Flow steering cmd interface should get the fte when deleting
Previously, deleting a flow steering entry only got the index.
Since the FPGA implementation of FTE's deletion might need to dig
inside the FTE itself, we would like to get the FTE's context.
Changing the interface to pass the FTE context.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:20:15 -08:00
Boris Pismenny
3346c48737 {net,IB}/mlx5: Add flow steering helpers
Add helper functions that check if a protocol is
part of a flow steering match criteria.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:20:14 -08:00
Matan Barak
d2ec6a35e8 net/mlx5: Embed mlx5_flow_act into fs_fte
fte objects contain the match value and action. Currently, extending
the actions require in adding them both to the API and fs_fte.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:20:13 -08:00
Aviad Yehezkel
5f4183781a net/mlx5: Add empty egress namespace to flow steering core
Currently, we don't support egress flow steering namespace in mlx5
flow steering core implementation. However, when we want to encrypt
a packet, we model it as a flow steering rule in the egress path.
To overcome this, we add an empty egress namespace to flow steering.
This namespace is initialized only when ipsec support exists.
In the future, this will grow to a full blown full steering
implementation, resembling the ingress path.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:20:13 -08:00
Matan Barak
af76c50198 net/mlx5: Add shim layer between fs and cmd
The shim layer allows each namespace to define possibly different
functionality for add/delete/update commands. The shim layer
introduced here, will be used to support flow steering with the FPGA.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:19:56 -08:00
Matan Barak
a9db0ecf15 {net,IB}/mlx5: Add has_tag to mlx5_flow_act
The has_tag member will indicate whether a tag action was specified
in flow specification.

A flow tag 0 = MLX5_FS_DEFAULT_FLOW_TAG is assumed a valid flow tag
that is currently used by mlx5 RDMA driver, whereas in HW flow_tag = 0
means that the user doesn't care about flow_tag.  HW always provide
a flow_tag = 0 if all flow tags requested on a specific flow are 0.

So we need a way (in the driver) to differentiate between a user really
requesting flow_tag = 0 and a user who does not care, in order to be
able to report conflicting flow tags on a specific flow.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:33 -08:00
Boris Pismenny
075572d4b7 IB/mlx5: Pass mlx5_flow_act struct instead of multiple arguments
Group and pass all function arguments of parse_flow_attr call in one
common struct mlx5_flow_act.

This patch passes all the action arguments of parse_flow_attr in one common
struct mlx5_flow_act. It allows us to scale the number of actions without adding
new arguments to the function.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 22:06:11 -08:00
Matan Barak
04e87170b0 net/mlx5: FPGA and IPSec initialization to be before flow steering
Some flow steering namespace initialization (i.e. egress namespace)
might depend on FPGA capabilities. Changing the initialization order
such that the FPGA will be initialized before flow steering.

Flow steering fs cmds initialization might depend on
IPSec capabilities. Changing the initialization order such
that the IPSec will be initialized before flow steering as well.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:10 -08:00
Aviad Yehezkel
1c9a10ebc7 net/mlx5e: Removed not need synchronize_rcu
This is already done by xfrm layer between state_dev_del callback
to state_dev_free callback.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:09 -08:00
Aviad Yehezkel
dc7debec07 net/mlx5e: Fixed sleeping inside atomic context
We can't allocate with GFP_KERNEL inside spinlock.
Actually ida_simple doesn't require spinlock so remove it.

Fixes: 547eede070 ("net/mlx5e: IPSec, Innova IPSec offload infrastructure")
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:09 -08:00
Aviad Yehezkel
ef927a9c16 net/mlx5e: Wait for FPGA command responses with a timeout
Generally, FPGA IPSec commands must always complete.
We want to wait for one minute for them to complete gracefully also
when killing a process.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:08 -08:00
Aviad Yehezkel
46f3ee4f3a net/mlx5: Fixed compilation issue when CONFIG_MLX5_ACCEL is disabled
IPSec init and cleanup functions also depends on linux/mlx5/driver.h.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:08 -08:00
Aviad Yehezkel
c33251a3c6 IB/mlx5: Removed not used parameters
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 22:05:36 -08:00
Gustavo A. R. Silva
63231585a6 RDMA/bnxt_re/qplib_sp: Use true and false for boolean values
Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Sergey Gorenko
fbd36818ee IB/srp: Use the IB_DEVICE_SG_GAPS_REG HCA feature if supported
If a HCA supports the SG_GAPS_REG feature then fewer memory regions
are required per command. This patch reduces the number of memory
regions that is allocated per SRP session.

Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Bart Van Assche
4190443947 IB/hfi1: Add a missing rcu_read_unlock()
This patch avoids that sparse reports the following:

drivers/infiniband/hw/hfi1/driver.c:251:13: warning: context imbalance in 'rcv_hdrerr' - different lock contexts for basic block

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Arushi
666fe24bbe infiniband: hw: Drop unnecessary continue
Continue at the bottom of a loop are removed.
Issue found using drop_continue.cocci Coccinelle script.

Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Shiraz Saleem
7e952b19eb i40iw: Implement get_vector_affinity API
Storage ULPs (like NVMEoF) benefit from exposing affinity mapping
per completion vector to find the optimal multi-queue affinity
assignments. The ULPs call the verbs API ib_get_vector_affinity
introduced in commit c66cd353bb ("RDMA/core: expose affinity mappings per
completion vector") to get the underlying devices affinity mappings.

Add support in driver to expose the affinity masks per MSI-X
completion vector.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Shiraz Saleem
7de8b3576a i40iw: Improve CM node lookup time on connection setup
Currently all CM nodes involved in a connection are
maintained in a connected_node list per dev. During
connection setup, we need to search this every time
we receive a packet on the iWARP LAN Queue (ILQ) and
this can be pretty inefficient for large number of
connections.

Fix this by organizing the CM nodes in two lists -
accelerated list and non-accelerated list. The search
on ILQ receive would be limited to only non accelerated
nodes. When a node moves to RTS, it is added to the
accelerated list.

Benchmarking ucmatose 16k connections shows a 20%
improvement in test completion time.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Mustafa Ismail
6b0c549fc6 i40iw: Refactor handling of txpend list
Currently the TX pending lists for IEQ and ILQ are
handled separately. The handling of both can be
consolidated in i40iw_poll_completion.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Bart Van Assche
2a78cb4db4 IB/srpt: Fix an out-of-bounds stack access in srpt_zerolength_write()
Avoid triggering an out-of-bounds stack access by changing the type
of 'wr' from ib_send_wr into ib_rdma_wr.

This patch fixes the following KASAN bug report:

BUG: KASAN: stack-out-of-bounds in rxe_post_send+0x7a9/0x9a0 [rdma_rxe]
Read of size 8 at addr ffff880068197a48 by task kworker/2:1/44

Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
 dump_stack+0x8e/0xcd
 print_address_description+0x6f/0x280
 kasan_report+0x25a/0x380
 __asan_load8+0x54/0x90
 rxe_post_send+0x7a9/0x9a0 [rdma_rxe]
 srpt_zerolength_write+0xf0/0x180 [ib_srpt]
 srpt_cm_rtu_recv+0x68/0x110 [ib_srpt]
 srpt_rdma_cm_handler+0xbb/0x15b [ib_srpt]
 cma_ib_handler+0x1aa/0x4a0 [rdma_cm]
 cm_process_work+0x30/0x100 [ib_cm]
 cm_work_handler+0xa86/0x351b [ib_cm]
 process_one_work+0x475/0x9f0
 worker_thread+0x69/0x690
 kthread+0x1ad/0x1d0
 ret_from_fork+0x3a/0x50

Fixes: aaf45bd83e ("IB/srpt: Detect session shutdown reliably")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Bart Van Assche
a6544a624c RDMA/rxe: Fix an out-of-bounds read
This patch avoids that KASAN reports the following when the SRP initiator
calls srp_post_send():

==================================================================
BUG: KASAN: stack-out-of-bounds in rxe_post_send+0x5c4/0x980 [rdma_rxe]
Read of size 8 at addr ffff880066606e30 by task 02-mq/1074

CPU: 2 PID: 1074 Comm: 02-mq Not tainted 4.16.0-rc3-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x85/0xc7
print_address_description+0x65/0x270
kasan_report+0x231/0x350
rxe_post_send+0x5c4/0x980 [rdma_rxe]
srp_post_send.isra.16+0x149/0x190 [ib_srp]
srp_queuecommand+0x94d/0x1670 [ib_srp]
scsi_dispatch_cmd+0x1c2/0x550 [scsi_mod]
scsi_queue_rq+0x843/0xa70 [scsi_mod]
blk_mq_dispatch_rq_list+0x143/0xac0
blk_mq_do_dispatch_ctx+0x1c5/0x260
blk_mq_sched_dispatch_requests+0x2bf/0x2f0
__blk_mq_run_hw_queue+0xdb/0x160
__blk_mq_delay_run_hw_queue+0xba/0x100
blk_mq_run_hw_queue+0xf2/0x190
blk_mq_sched_insert_request+0x163/0x2f0
blk_execute_rq+0xb0/0x130
scsi_execute+0x14e/0x260 [scsi_mod]
scsi_probe_and_add_lun+0x366/0x13d0 [scsi_mod]
__scsi_scan_target+0x18a/0x810 [scsi_mod]
scsi_scan_target+0x11e/0x130 [scsi_mod]
srp_create_target+0x1522/0x19e0 [ib_srp]
kernfs_fop_write+0x180/0x210
__vfs_write+0xb1/0x2e0
vfs_write+0xf6/0x250
SyS_write+0x99/0x110
do_syscall_64+0xee/0x2b0
entry_SYSCALL_64_after_hwframe+0x42/0xb7

The buggy address belongs to the page:
page:ffffea0001998180 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff880066606d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
ffff880066606d80: f1 00 f2 f2 f2 f2 f2 f2 f2 00 00 f2 f2 f2 f2 f2
>ffff880066606e00: f2 00 00 00 00 00 f2 f2 f2 f3 f3 f3 f3 00 00 00
                                    ^
ffff880066606e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880066606f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00