RDMA 5.8 merge window pull request

A few large, long discussed works this time. The RNBD block driver has
 been posted for nearly two years now, and the removal of FMR has been a
 recurring discussion theme for a long time. The usual smattering of
 features and bug fixes.
 
 - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa
 
 - Continuing driver cleanups in bnxt_re, hns
 
 - Big cleanup of mlx5 QP creation flows
 
 - More consistent use of src port and flow label when LAG is used and a
   mlx5 implementation
 
 - Additional set of cleanups for IB CM
 
 - 'RNBD' network block driver and target. This is a network block RDMA
   device specific to ionos's cloud environment. It brings strong multipath
   and resiliency capabilities.
 
 - Accelerated IPoIB for HFI1
 
 - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds
 
 - Support for exchanging the new IBTA defiend ECE data during RDMA CM
   exchanges
 
 - Removal of the very old and insecure FMR interface from all ULPs and
   drivers. FRWR should be preferred for at least a decade now.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl7X/IwACgkQOG33FX4g
 mxp2uw/+MI2S/aXqEBvZfTT8yrkAwqYezS0VeTDnwH/T6UlTMDhHVN/2Ji3tbbX3
 FEKT1i2mnAL5RqUAL1lr9g4sG/bVozrpN46Ws5Lu9dTbIPLKTNPWDuLFQDUShKY7
 OyMI/bRx6anGnsOy20iiBqnrQbrrZj5TECgnmrkAl62QFdcl7aBWe/yYjy4CT11N
 ub+aBXBREN1F1pc0HIjd2tI+8gnZc+mNm1LVVDRH9Capun/pI26qDNh7e6QwGyIo
 n8ItraC8znLwv/nsUoTE7/JRcsTEe6vJI26PQmczZfNJs/4O65G7fZg0eSBseZYi
 qKf7Uwtb3qW0R7jRUMEgFY4DKXVAA0G2ph40HXBuzOSsqlT6HqYMO2wgG8pJkrTc
 qAjoSJGzfAHIsjxzxKI8wKuufCddjCm30VWWU7EKeriI6h1J0uPVqKkQMfYBTkik
 696eZSBycAVgwayOng3XaehiTxOL7qGMTjUpDjUR6UscbiPG919vP+QsbIUuBXdb
 YoddBQJdyGJiaCXv32ciJjo9bjPRRi/bII7Q5qzCNI2mi4ZVbudF4ffzyQvdHtNJ
 nGnpRXoPi7kMvUrKTMPWkFjj0R5/UsPszsA51zbxPydfgBe0Dlc2PrrIG8dlzYAp
 wbV0Lec+iJucKlt7EZtrjz1xOiOOaQt/5/cW1bWqL+wk2t6gAuY=
 =9zTe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A more active cycle than most of the recent past, with a few large,
  long discussed works this time.

  The RNBD block driver has been posted for nearly two years now, and
  flowing through RDMA due to it also introducing a new ULP.

  The removal of FMR has been a recurring discussion theme for a long
  time.

  And the usual smattering of features and bug fixes.

  Summary:

   - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa

   - Continuing driver cleanups in bnxt_re, hns

   - Big cleanup of mlx5 QP creation flows

   - More consistent use of src port and flow label when LAG is used and
     a mlx5 implementation

   - Additional set of cleanups for IB CM

   - 'RNBD' network block driver and target. This is a network block
     RDMA device specific to ionos's cloud environment. It brings strong
     multipath and resiliency capabilities.

   - Accelerated IPoIB for HFI1

   - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple
     async fds

   - Support for exchanging the new IBTA defiend ECE data during RDMA CM
     exchanges

   - Removal of the very old and insecure FMR interface from all ULPs
     and drivers. FRWR should be preferred for at least a decade now"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (247 commits)
  RDMA/cm: Spurious WARNING triggered in cm_destroy_id()
  RDMA/mlx5: Return ECE DC support
  RDMA/mlx5: Don't rely on FW to set zeros in ECE response
  RDMA/mlx5: Return an error if copy_to_user fails
  IB/hfi1: Use free_netdev() in hfi1_netdev_free()
  RDMA/hns: Uninitialized variable in modify_qp_init_to_rtr()
  RDMA/core: Move and rename trace_cm_id_create()
  IB/hfi1: Fix hfi1_netdev_rx_init() error handling
  RDMA: Remove 'max_map_per_fmr'
  RDMA: Remove 'max_fmr'
  RDMA/core: Remove FMR device ops
  RDMA/rdmavt: Remove FMR memory registration
  RDMA/mthca: Remove FMR support for memory registration
  RDMA/mlx4: Remove FMR support for memory registration
  RDMA/i40iw: Remove FMR leftovers
  RDMA/bnxt_re: Remove FMR leftovers
  RDMA/mlx5: Remove FMR leftovers
  RDMA/core: Remove FMR pool API
  RDMA/rds: Remove FMR support for memory registration
  RDMA/srp: Remove support for FMR memory registration
  ...
This commit is contained in:
Linus Torvalds 2020-06-05 14:05:57 -07:00
commit 242b233198
244 changed files with 23842 additions and 10837 deletions

View File

@ -0,0 +1,46 @@
What: /sys/block/rnbd<N>/rnbd/unmap_device
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: To unmap a volume, "normal" or "force" has to be written to:
/sys/block/rnbd<N>/rnbd/unmap_device
When "normal" is used, the operation will fail with EBUSY if any process
is using the device. When "force" is used, the device is also unmapped
when device is in use. All I/Os that are in progress will fail.
Example:
# echo "normal" > /sys/block/rnbd0/rnbd/unmap_device
What: /sys/block/rnbd<N>/rnbd/state
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: The file contains the current state of the block device. The state file
returns "open" when the device is successfully mapped from the server
and accepting I/O requests. When the connection to the server gets
disconnected in case of an error (e.g. link failure), the state file
returns "closed" and all I/O requests submitted to it will fail with -EIO.
What: /sys/block/rnbd<N>/rnbd/session
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RNBD uses RTRS session to transport the data between client and
server. The entry "session" contains the name of the session, that
was used to establish the RTRS session. It's the same name that
was passed as server parameter to the map_device entry.
What: /sys/block/rnbd<N>/rnbd/mapping_path
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Contains the path that was passed as "device_path" to the map_device
operation.
What: /sys/block/rnbd<N>/rnbd/access_mode
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Contains the device access mode: ro, rw or migration.

View File

@ -0,0 +1,111 @@
What: /sys/class/rnbd-client
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Provide information about RNBD-client.
All sysfs files that are not read-only provide the usage information on read:
Example:
# cat /sys/class/rnbd-client/ctl/map_device
> Usage: echo "sessname=<name of the rtrs session> path=<[srcaddr,]dstaddr>
> [path=<[srcaddr,]dstaddr>] device_path=<full path on remote side>
> [access_mode=<ro|rw|migration>] > map_device
>
> addr ::= [ ip:<ipv4> | ip:<ipv6> | gid:<gid> ]
What: /sys/class/rnbd-client/ctl/map_device
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Expected format is the following:
sessname=<name of the rtrs session>
path=<[srcaddr,]dstaddr> [path=<[srcaddr,]dstaddr> ...]
device_path=<full path on remote side>
[access_mode=<ro|rw|migration>]
Where:
sessname: accepts a string not bigger than 256 chars, which identifies
a given session on the client and on the server.
I.e. "clt_hostname-srv_hostname" could be a natural choice.
path: describes a connection between the client and the server by
specifying destination and, when required, the source address.
The addresses are to be provided in the following format:
ip:<IPv6>
ip:<IPv4>
gid:<GID>
for example:
path=ip:10.0.0.66
The single addr is treated as the destination.
The connection will be established to this server from any client IP address.
path=ip:10.0.0.66,ip:10.0.1.66
First addr is the source address and the second is the destination.
If multiple "path=" options are specified multiple connection
will be established and data will be sent according to
the selected multipath policy (see RTRS mp_policy sysfs entry description).
device_path: Path to the block device on the server side. Path is specified
relative to the directory on server side configured in the
'dev_search_path' module parameter of the rnbd_server.
The rnbd_server prepends the <device_path> received from client
with <dev_search_path> and tries to open the
<dev_search_path>/<device_path> block device. On success,
a /dev/rnbd<N> device file, a /sys/block/rnbd_client/rnbd<N>/
directory and an entry in /sys/class/rnbd-client/ctl/devices
will be created.
If 'dev_search_path' contains '%SESSNAME%', then each session can
have different devices namespace, e.g. server was configured with
the following parameter "dev_search_path=/run/rnbd-devs/%SESSNAME%",
client has this string "sessname=blya device_path=sda", then server
will try to open: /run/rnbd-devs/blya/sda.
access_mode: the access_mode parameter specifies if the device is to be
mapped as "ro" read-only or "rw" read-write. The server allows
a device to be exported in rw mode only once. The "migration"
access mode has to be specified if a second mapping in read-write
mode is desired.
By default "rw" is used.
Exit Codes:
If the device is already mapped it will fail with EEXIST. If the input
has an invalid format it will return EINVAL. If the device path cannot
be found on the server, it will fail with ENOENT.
Finding device file after mapping
---------------------------------
After mapping, the device file can be found by:
o The symlink /sys/class/rnbd-client/ctl/devices/<device_id>
points to /sys/block/<dev-name>. The last part of the symlink destination
is the same as the device name. By extracting the last part of the
path the path to the device /dev/<dev-name> can be build.
o /dev/block/$(cat /sys/class/rnbd-client/ctl/devices/<device_id>/dev)
How to find the <device_id> of the device is described on the next
section.
What: /sys/class/rnbd-client/ctl/devices/
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: For each device mapped on the client a new symbolic link is created as
/sys/class/rnbd-client/ctl/devices/<device_id>, which points
to the block device created by rnbd (/sys/block/rnbd<N>/).
The <device_id> of each device is created as follows:
- If the 'device_path' provided during mapping contains slashes ("/"),
they are replaced by exclamation mark ("!") and used as as the
<device_id>. Otherwise, the <device_id> will be the same as the
"device_path" provided.

View File

@ -0,0 +1,50 @@
What: /sys/class/rnbd-server
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: provide information about RNBD-server.
What: /sys/class/rnbd-server/ctl/
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: When a client maps a device, a directory entry with the name of the
block device is created under /sys/class/rnbd-server/ctl/devices/.
What: /sys/class/rnbd-server/ctl/devices/<device_name>/block_dev
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Is a symlink to the sysfs entry of the exported device.
Example:
block_dev -> ../../../../class/block/ram0
What: /sys/class/rnbd-server/ctl/devices/<device_name>/sessions/
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: For each client a particular device is exported to, following directory will be
created:
/sys/class/rnbd-server/ctl/devices/<device_name>/sessions/<session-name>/
When the device is unmapped by that client, the directory will be removed.
What: /sys/class/rnbd-server/ctl/devices/<device_name>/sessions/<session-name>/read_only
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Contains '1' if device is mapped read-only, otherwise '0'.
What: /sys/class/rnbd-server/ctl/devices/<device_name>/sessions/<session-name>/mapping_path
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Contains the relative device path provided by the user during mapping.
What: /sys/class/rnbd-server/ctl/devices/<device_name>/sessions/<session-name>/access_mode
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Contains the device access mode: ro, rw or migration.

View File

@ -0,0 +1,131 @@
What: /sys/class/rtrs-client
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: When a user of RTRS API creates a new session, a directory entry with
the name of that session is created under /sys/class/rtrs-client/<session-name>/
What: /sys/class/rtrs-client/<session-name>/add_path
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RW, adds a new path (connection) to an existing session. Expected format is the
following:
<[source addr,]destination addr>
*addr ::= [ ip:<ipv4|ipv6> | gid:<gid> ]
What: /sys/class/rtrs-client/<session-name>/max_reconnect_attempts
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Maximum number reconnect attempts the client should make before giving up
after connection breaks unexpectedly.
What: /sys/class/rtrs-client/<session-name>/mp_policy
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Multipath policy specifies which path should be selected on each IO:
round-robin (0):
select path in per CPU round-robin manner.
min-inflight (1):
select path with minimum inflights.
What: /sys/class/rtrs-client/<session-name>/paths/
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Each path belonging to a given session is listed here by its source and
destination address. When a new path is added to a session by writing to
the "add_path" entry, a directory <src@dst> is created.
What: /sys/class/rtrs-client/<session-name>/paths/<src@dst>/state
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RO, Contains "connected" if the session is connected to the peer and fully
functional. Otherwise the file contains "disconnected"
What: /sys/class/rtrs-client/<session-name>/paths/<src@dst>/reconnect
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Write "1" to the file in order to reconnect the path.
Operation is blocking and returns 0 if reconnect was successful.
What: /sys/class/rtrs-client/<session-name>/paths/<src@dst>/disconnect
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Write "1" to the file in order to disconnect the path.
Operation blocks until RTRS path is disconnected.
What: /sys/class/rtrs-client/<session-name>/paths/<src@dst>/remove_path
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Write "1" to the file in order to disconnected and remove the path
from the session. Operation blocks until the path is disconnected
and removed from the session.
What: /sys/class/rtrs-client/<session-name>/paths/<src@dst>/hca_name
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RO, Contains the the name of HCA the connection established on.
What: /sys/class/rtrs-client/<session-name>/paths/<src@dst>/hca_port
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RO, Contains the port number of active port traffic is going through.
What: /sys/class/rtrs-client/<session-name>/paths/<src@dst>/src_addr
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RO, Contains the source address of the path
What: /sys/class/rtrs-client/<session-name>/paths/<src@dst>/dst_addr
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RO, Contains the destination address of the path
What: /sys/class/rtrs-client/<session-name>/paths/<src@dst>/stats/reset_all
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RW, Read will return usage help, write 0 will clear all the statistics.
What: /sys/class/rtrs-client/<session-name>/paths/<src@dst>/stats/cpu_migration
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RTRS expects that each HCA IRQ is pinned to a separate CPU. If it's
not the case, the processing of an I/O response could be processed on a
different CPU than where it was originally submitted. This file shows
how many interrupts where generated on a non expected CPU.
"from:" is the CPU on which the IRQ was expected, but not generated.
"to:" is the CPU on which the IRQ was generated, but not expected.
What: /sys/class/rtrs-client/<session-name>/paths/<src@dst>/stats/reconnects
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Contains 2 unsigned int values, the first one records number of successful
reconnects in the path lifetime, the second one records number of failed
reconnects in the path lifetime.
What: /sys/class/rtrs-client/<session-name>/paths/<src@dst>/stats/rdma
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Contains statistics regarding rdma operations and inflight operations.
The output consists of 6 values:
<read-count> <read-total-size> <write-count> <write-total-size> \
<inflights> <failovered>

View File

@ -0,0 +1,53 @@
What: /sys/class/rtrs-server
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: When a user of RTRS API creates a new session on a client side, a
directory entry with the name of that session is created in here.
What: /sys/class/rtrs-server/<session-name>/paths/
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: When new path is created by writing to "add_path" entry on client side,
a directory entry named as <source address>@<destination address> is created
on server.
What: /sys/class/rtrs-server/<session-name>/paths/<src@dst>/disconnect
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: When "1" is written to the file, the RTRS session is being disconnected.
Operations is non-blocking and returns control immediately to the caller.
What: /sys/class/rtrs-server/<session-name>/paths/<src@dst>/hca_name
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RO, Contains the the name of HCA the connection established on.
What: /sys/class/rtrs-server/<session-name>/paths/<src@dst>/hca_port
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RO, Contains the port number of active port traffic is going through.
What: /sys/class/rtrs-server/<session-name>/paths/<src@dst>/src_addr
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RO, Contains the source address of the path
What: /sys/class/rtrs-server/<session-name>/paths/<src@dst>/dst_addr
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: RO, Contains the destination address of the path
What: /sys/class/rtrs-server/<session-name>/paths/<src@dst>/stats/rdma
Date: Feb 2020
KernelVersion: 5.7
Contact: Jack Wang <jinpu.wang@cloud.ionos.com> Danil Kipnis <danil.kipnis@cloud.ionos.com>
Description: Contains statistics regarding rdma operations and inflight operations.
The output consists of 5 values:
<read-count> <read-total-size> <write-count> <write-total-size> <inflights>

View File

@ -37,9 +37,6 @@ InfiniBand core interfaces
.. kernel-doc:: drivers/infiniband/core/ud_header.c
:export:
.. kernel-doc:: drivers/infiniband/core/fmr_pool.c
:export:
.. kernel-doc:: drivers/infiniband/core/umem.c
:export:

View File

@ -22,7 +22,6 @@ Sleeping and interrupt context
- post_recv
- poll_cq
- req_notify_cq
- map_phys_fmr
which may not sleep and must be callable from any context.
@ -36,7 +35,6 @@ Sleeping and interrupt context
- ib_post_send
- ib_post_recv
- ib_req_notify_cq
- ib_map_phys_fmr
are therefore safe to call from any context.

View File

@ -14579,6 +14579,13 @@ F: arch/riscv/
N: riscv
K: riscv
RNBD BLOCK DRIVERS
M: Danil Kipnis <danil.kipnis@cloud.ionos.com>
M: Jack Wang <jinpu.wang@cloud.ionos.com>
L: linux-block@vger.kernel.org
S: Maintained
F: drivers/block/rnbd/
ROCCAT DRIVERS
M: Stefan Achatz <erazor_de@users.sourceforge.net>
S: Maintained
@ -14716,6 +14723,13 @@ S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git rtl8xxxu-devel
F: drivers/net/wireless/realtek/rtl8xxxu/
RTRS TRANSPORT DRIVERS
M: Danil Kipnis <danil.kipnis@cloud.ionos.com>
M: Jack Wang <jinpu.wang@cloud.ionos.com>
L: linux-rdma@vger.kernel.org
S: Maintained
F: drivers/infiniband/ulp/rtrs/
RXRPC SOCKETS (AF_RXRPC)
M: David Howells <dhowells@redhat.com>
L: linux-afs@lists.infradead.org

View File

@ -458,4 +458,6 @@ config BLK_DEV_RSXX
To compile this driver as a module, choose M here: the
module will be called rsxx.
source "drivers/block/rnbd/Kconfig"
endif # BLK_DEV

View File

@ -39,6 +39,7 @@ obj-$(CONFIG_BLK_DEV_PCIESSD_MTIP32XX) += mtip32xx/
obj-$(CONFIG_BLK_DEV_RSXX) += rsxx/
obj-$(CONFIG_ZRAM) += zram/
obj-$(CONFIG_BLK_DEV_RNBD) += rnbd/
obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk.o
null_blk-objs := null_blk_main.o

View File

@ -0,0 +1,28 @@
# SPDX-License-Identifier: GPL-2.0-or-later
config BLK_DEV_RNBD
bool
config BLK_DEV_RNBD_CLIENT
tristate "RDMA Network Block Device driver client"
depends on INFINIBAND_RTRS_CLIENT
select BLK_DEV_RNBD
help
RNBD client is a network block device driver using rdma transport.
RNBD client allows for mapping of a remote block devices over
RTRS protocol from a target system where RNBD server is running.
If unsure, say N.
config BLK_DEV_RNBD_SERVER
tristate "RDMA Network Block Device driver server"
depends on INFINIBAND_RTRS_SERVER
select BLK_DEV_RNBD
help
RNBD server is the server side of RNBD using rdma transport.
RNBD server allows for exporting local block devices to a remote client
over RTRS protocol.
If unsure, say N.

View File

@ -0,0 +1,15 @@
# SPDX-License-Identifier: GPL-2.0-or-later
ccflags-y := -I$(srctree)/drivers/infiniband/ulp/rtrs
rnbd-client-y := rnbd-clt.o \
rnbd-clt-sysfs.o \
rnbd-common.o
rnbd-server-y := rnbd-common.o \
rnbd-srv.o \
rnbd-srv-dev.o \
rnbd-srv-sysfs.o
obj-$(CONFIG_BLK_DEV_RNBD_CLIENT) += rnbd-client.o
obj-$(CONFIG_BLK_DEV_RNBD_SERVER) += rnbd-server.o

92
drivers/block/rnbd/README Normal file
View File

@ -0,0 +1,92 @@
********************************
RDMA Network Block Device (RNBD)
********************************
Introduction
------------
RNBD (RDMA Network Block Device) is a pair of kernel modules
(client and server) that allow for remote access of a block device on
the server over RTRS protocol using the RDMA (InfiniBand, RoCE, iWARP)
transport. After being mapped, the remote block devices can be accessed
on the client side as local block devices.
I/O is transferred between client and server by the RTRS transport
modules. The administration of RNBD and RTRS modules is done via
sysfs entries.
Requirements
------------
RTRS kernel modules
Quick Start
-----------
Server side:
# modprobe rnbd_server
Client side:
# modprobe rnbd_client
# echo "sessname=blya path=ip:10.50.100.66 device_path=/dev/ram0" > \
/sys/devices/virtual/rnbd-client/ctl/map_device
Where "sessname=" is a session name, a string to identify the session
on client and on server sides; "path=" is a destination IP address or
a pair of a source and a destination IPs, separated by comma. Multiple
"path=" options can be specified in order to use multipath (see RTRS
description for details); "device_path=" is the block device to be
mapped from the server side. After the session to the server machine is
established, the mapped device will appear on the client side under
/dev/rnbd<N>.
RNBD-Server Module Parameters
=============================
dev_search_path
---------------
When a device is mapped from the client, the server generates the path
to the block device on the server side by concatenating dev_search_path
and the "device_path" that was specified in the map_device operation.
The default dev_search_path is: "/".
dev_search_path option can also contain %SESSNAME% in order to provide
different device namespaces for different sessions. See "device_path"
option for details.
============================
Protocol (rnbd/rnbd-proto.h)
============================
1. Before mapping first device from a given server, client sends an
RNBD_MSG_SESS_INFO to the server. Server responds with
RNBD_MSG_SESS_INFO_RSP. Currently the messages only contain the protocol
version for backward compatibility.
2. Client requests to open a device by sending RNBD_MSG_OPEN message. This
contains the path to the device and access mode (read-only or writable).
Server responds to the message with RNBD_MSG_OPEN_RSP. This contains
a 32 bit device id to be used for IOs and device "geometry" related
information: side, max_hw_sectors, etc.
3. Client attaches RNBD_MSG_IO to each IO message send to a device. This
message contains device id, provided by server in his rnbd_msg_open_rsp,
sector to be accessed, read-write flags and bi_size.
4. Client closes a device by sending RNBD_MSG_CLOSE which contains only the
device id provided by the server.
=========================================
Contributors List(in alphabetical order)
=========================================
Danil Kipnis <danil.kipnis@profitbricks.com>
Fabian Holler <mail@fholler.de>
Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Jack Wang <jinpu.wang@profitbricks.com>
Kleber Souza <kleber.souza@profitbricks.com>
Lutz Pogrell <lutz.pogrell@cloud.ionos.com>
Milind Dumbare <Milind.dumbare@gmail.com>
Roman Penyaev <roman.penyaev@profitbricks.com>

View File

@ -0,0 +1,639 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* RDMA Network Block Driver
*
* Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
* Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
* Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
*/
#undef pr_fmt
#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
#include <linux/types.h>
#include <linux/ctype.h>
#include <linux/parser.h>
#include <linux/module.h>
#include <linux/in6.h>
#include <linux/fs.h>
#include <linux/uaccess.h>
#include <linux/device.h>
#include <rdma/ib.h>
#include <rdma/rdma_cm.h>
#include "rnbd-clt.h"
static struct device *rnbd_dev;
static struct class *rnbd_dev_class;
static struct kobject *rnbd_devs_kobj;
enum {
RNBD_OPT_ERR = 0,
RNBD_OPT_DEST_PORT = 1 << 0,
RNBD_OPT_PATH = 1 << 1,
RNBD_OPT_DEV_PATH = 1 << 2,
RNBD_OPT_ACCESS_MODE = 1 << 3,
RNBD_OPT_SESSNAME = 1 << 6,
};
static const unsigned int rnbd_opt_mandatory[] = {
RNBD_OPT_PATH,
RNBD_OPT_DEV_PATH,
RNBD_OPT_SESSNAME,
};
static const match_table_t rnbd_opt_tokens = {
{RNBD_OPT_PATH, "path=%s" },
{RNBD_OPT_DEV_PATH, "device_path=%s"},
{RNBD_OPT_DEST_PORT, "dest_port=%d" },
{RNBD_OPT_ACCESS_MODE, "access_mode=%s"},
{RNBD_OPT_SESSNAME, "sessname=%s" },
{RNBD_OPT_ERR, NULL },
};
struct rnbd_map_options {
char *sessname;
struct rtrs_addr *paths;
size_t *path_cnt;
char *pathname;
u16 *dest_port;
enum rnbd_access_mode *access_mode;
};
static int rnbd_clt_parse_map_options(const char *buf, size_t max_path_cnt,
struct rnbd_map_options *opt)
{
char *options, *sep_opt;
char *p;
substring_t args[MAX_OPT_ARGS];
int opt_mask = 0;
int token;
int ret = -EINVAL;
int i, dest_port;
int p_cnt = 0;
options = kstrdup(buf, GFP_KERNEL);
if (!options)
return -ENOMEM;
sep_opt = strstrip(options);
while ((p = strsep(&sep_opt, " ")) != NULL) {
if (!*p)
continue;
token = match_token(p, rnbd_opt_tokens, args);
opt_mask |= token;
switch (token) {
case RNBD_OPT_SESSNAME:
p = match_strdup(args);
if (!p) {
ret = -ENOMEM;
goto out;
}
if (strlen(p) > NAME_MAX) {
pr_err("map_device: sessname too long\n");
ret = -EINVAL;
kfree(p);
goto out;
}
strlcpy(opt->sessname, p, NAME_MAX);
kfree(p);
break;
case RNBD_OPT_PATH:
if (p_cnt >= max_path_cnt) {
pr_err("map_device: too many (> %zu) paths provided\n",
max_path_cnt);
ret = -ENOMEM;
goto out;
}
p = match_strdup(args);
if (!p) {
ret = -ENOMEM;
goto out;
}
ret = rtrs_addr_to_sockaddr(p, strlen(p),
*opt->dest_port,
&opt->paths[p_cnt]);
if (ret) {
pr_err("Can't parse path %s: %d\n", p, ret);
kfree(p);
goto out;
}
p_cnt++;
kfree(p);
break;
case RNBD_OPT_DEV_PATH:
p = match_strdup(args);
if (!p) {
ret = -ENOMEM;
goto out;
}
if (strlen(p) > NAME_MAX) {
pr_err("map_device: Device path too long\n");
ret = -EINVAL;
kfree(p);
goto out;
}
strlcpy(opt->pathname, p, NAME_MAX);
kfree(p);
break;
case RNBD_OPT_DEST_PORT:
if (match_int(args, &dest_port) || dest_port < 0 ||
dest_port > 65535) {
pr_err("bad destination port number parameter '%d'\n",
dest_port);
ret = -EINVAL;
goto out;
}
*opt->dest_port = dest_port;
break;
case RNBD_OPT_ACCESS_MODE:
p = match_strdup(args);
if (!p) {
ret = -ENOMEM;
goto out;
}
if (!strcmp(p, "ro")) {
*opt->access_mode = RNBD_ACCESS_RO;
} else if (!strcmp(p, "rw")) {
*opt->access_mode = RNBD_ACCESS_RW;
} else if (!strcmp(p, "migration")) {
*opt->access_mode = RNBD_ACCESS_MIGRATION;
} else {
pr_err("map_device: Invalid access_mode: '%s'\n",
p);
ret = -EINVAL;
kfree(p);
goto out;
}
kfree(p);
break;
default:
pr_err("map_device: Unknown parameter or missing value '%s'\n",
p);
ret = -EINVAL;
goto out;
}
}
for (i = 0; i < ARRAY_SIZE(rnbd_opt_mandatory); i++) {
if ((opt_mask & rnbd_opt_mandatory[i])) {
ret = 0;
} else {
pr_err("map_device: Parameters missing\n");
ret = -EINVAL;
break;
}
}
out:
*opt->path_cnt = p_cnt;
kfree(options);
return ret;
}
static ssize_t state_show(struct kobject *kobj,
struct kobj_attribute *attr, char *page)
{
struct rnbd_clt_dev *dev;
dev = container_of(kobj, struct rnbd_clt_dev, kobj);
switch (dev->dev_state) {
case DEV_STATE_INIT:
return snprintf(page, PAGE_SIZE, "init\n");
case DEV_STATE_MAPPED:
/* TODO fix cli tool before changing to proper state */
return snprintf(page, PAGE_SIZE, "open\n");
case DEV_STATE_MAPPED_DISCONNECTED:
/* TODO fix cli tool before changing to proper state */
return snprintf(page, PAGE_SIZE, "closed\n");
case DEV_STATE_UNMAPPED:
return snprintf(page, PAGE_SIZE, "unmapped\n");
default:
return snprintf(page, PAGE_SIZE, "unknown\n");
}
}
static struct kobj_attribute rnbd_clt_state_attr = __ATTR_RO(state);
static ssize_t mapping_path_show(struct kobject *kobj,
struct kobj_attribute *attr, char *page)
{
struct rnbd_clt_dev *dev;
dev = container_of(kobj, struct rnbd_clt_dev, kobj);
return scnprintf(page, PAGE_SIZE, "%s\n", dev->pathname);
}
static struct kobj_attribute rnbd_clt_mapping_path_attr =
__ATTR_RO(mapping_path);
static ssize_t access_mode_show(struct kobject *kobj,
struct kobj_attribute *attr, char *page)
{
struct rnbd_clt_dev *dev;
dev = container_of(kobj, struct rnbd_clt_dev, kobj);
return snprintf(page, PAGE_SIZE, "%s\n",
rnbd_access_mode_str(dev->access_mode));
}
static struct kobj_attribute rnbd_clt_access_mode =
__ATTR_RO(access_mode);
static ssize_t rnbd_clt_unmap_dev_show(struct kobject *kobj,
struct kobj_attribute *attr, char *page)
{
return scnprintf(page, PAGE_SIZE, "Usage: echo <normal|force> > %s\n",
attr->attr.name);
}
static ssize_t rnbd_clt_unmap_dev_store(struct kobject *kobj,
struct kobj_attribute *attr,
const char *buf, size_t count)
{
struct rnbd_clt_dev *dev;
char *opt, *options;
bool force;
int err;
opt = kstrdup(buf, GFP_KERNEL);
if (!opt)
return -ENOMEM;
options = strstrip(opt);
dev = container_of(kobj, struct rnbd_clt_dev, kobj);
if (sysfs_streq(options, "normal")) {
force = false;
} else if (sysfs_streq(options, "force")) {
force = true;
} else {
rnbd_clt_err(dev,
"unmap_device: Invalid value: %s\n",
options);
err = -EINVAL;
goto out;
}
rnbd_clt_info(dev, "Unmapping device, option: %s.\n",
force ? "force" : "normal");
/*
* We take explicit module reference only for one reason: do not
* race with lockless rnbd_destroy_sessions().
*/
if (!try_module_get(THIS_MODULE)) {
err = -ENODEV;
goto out;
}
err = rnbd_clt_unmap_device(dev, force, &attr->attr);
if (err) {
if (err != -EALREADY)
rnbd_clt_err(dev, "unmap_device: %d\n", err);
goto module_put;
}
/*
* Here device can be vanished!
*/
err = count;
module_put:
module_put(THIS_MODULE);
out:
kfree(opt);
return err;
}
static struct kobj_attribute rnbd_clt_unmap_device_attr =
__ATTR(unmap_device, 0644, rnbd_clt_unmap_dev_show,
rnbd_clt_unmap_dev_store);
static ssize_t rnbd_clt_resize_dev_show(struct kobject *kobj,
struct kobj_attribute *attr,
char *page)
{
return scnprintf(page, PAGE_SIZE,
"Usage: echo <new size in sectors> > %s\n",
attr->attr.name);
}
static ssize_t rnbd_clt_resize_dev_store(struct kobject *kobj,
struct kobj_attribute *attr,
const char *buf, size_t count)
{
int ret;
unsigned long sectors;
struct rnbd_clt_dev *dev;
dev = container_of(kobj, struct rnbd_clt_dev, kobj);
ret = kstrtoul(buf, 0, &sectors);
if (ret)
return ret;
ret = rnbd_clt_resize_disk(dev, (size_t)sectors);
if (ret)
return ret;
return count;
}
static struct kobj_attribute rnbd_clt_resize_dev_attr =
__ATTR(resize, 0644, rnbd_clt_resize_dev_show,
rnbd_clt_resize_dev_store);
static ssize_t rnbd_clt_remap_dev_show(struct kobject *kobj,
struct kobj_attribute *attr, char *page)
{
return scnprintf(page, PAGE_SIZE, "Usage: echo <1> > %s\n",
attr->attr.name);
}
static ssize_t rnbd_clt_remap_dev_store(struct kobject *kobj,
struct kobj_attribute *attr,
const char *buf, size_t count)
{
struct rnbd_clt_dev *dev;
char *opt, *options;
int err;
opt = kstrdup(buf, GFP_KERNEL);
if (!opt)
return -ENOMEM;
options = strstrip(opt);
dev = container_of(kobj, struct rnbd_clt_dev, kobj);
if (!sysfs_streq(options, "1")) {
rnbd_clt_err(dev,
"remap_device: Invalid value: %s\n",
options);
err = -EINVAL;
goto out;
}
err = rnbd_clt_remap_device(dev);
if (likely(!err))
err = count;
out:
kfree(opt);
return err;
}
static struct kobj_attribute rnbd_clt_remap_device_attr =
__ATTR(remap_device, 0644, rnbd_clt_remap_dev_show,
rnbd_clt_remap_dev_store);
static ssize_t session_show(struct kobject *kobj, struct kobj_attribute *attr,
char *page)
{
struct rnbd_clt_dev *dev;
dev = container_of(kobj, struct rnbd_clt_dev, kobj);
return scnprintf(page, PAGE_SIZE, "%s\n", dev->sess->sessname);
}
static struct kobj_attribute rnbd_clt_session_attr =
__ATTR_RO(session);
static struct attribute *rnbd_dev_attrs[] = {
&rnbd_clt_unmap_device_attr.attr,
&rnbd_clt_resize_dev_attr.attr,
&rnbd_clt_remap_device_attr.attr,
&rnbd_clt_mapping_path_attr.attr,
&rnbd_clt_state_attr.attr,
&rnbd_clt_session_attr.attr,
&rnbd_clt_access_mode.attr,
NULL,
};
void rnbd_clt_remove_dev_symlink(struct rnbd_clt_dev *dev)
{
/*
* The module unload rnbd_client_exit path is racing with unmapping of
* the last single device from the sysfs manually
* i.e. rnbd_clt_unmap_dev_store() leading to a sysfs warning because
* of sysfs link already was removed already.
*/
if (strlen(dev->blk_symlink_name) && try_module_get(THIS_MODULE)) {
sysfs_remove_link(rnbd_devs_kobj, dev->blk_symlink_name);
module_put(THIS_MODULE);
}
}
static struct kobj_type rnbd_dev_ktype = {
.sysfs_ops = &kobj_sysfs_ops,
.default_attrs = rnbd_dev_attrs,
};
static int rnbd_clt_add_dev_kobj(struct rnbd_clt_dev *dev)
{
int ret;
struct kobject *gd_kobj = &disk_to_dev(dev->gd)->kobj;
ret = kobject_init_and_add(&dev->kobj, &rnbd_dev_ktype, gd_kobj, "%s",
"rnbd");
if (ret)
rnbd_clt_err(dev, "Failed to create device sysfs dir, err: %d\n",
ret);
return ret;
}
static ssize_t rnbd_clt_map_device_show(struct kobject *kobj,
struct kobj_attribute *attr,
char *page)
{
return scnprintf(page, PAGE_SIZE,
"Usage: echo \"[dest_port=server port number] sessname=<name of the rtrs session> path=<[srcaddr@]dstaddr> [path=<[srcaddr@]dstaddr>] device_path=<full path on remote side> [access_mode=<ro|rw|migration>]\" > %s\n\naddr ::= [ ip:<ipv4> | ip:<ipv6> | gid:<gid> ]\n",
attr->attr.name);
}
static int rnbd_clt_get_path_name(struct rnbd_clt_dev *dev, char *buf,
size_t len)
{
int ret;
char pathname[NAME_MAX], *s;
strlcpy(pathname, dev->pathname, sizeof(pathname));
while ((s = strchr(pathname, '/')))
s[0] = '!';
ret = snprintf(buf, len, "%s", pathname);
if (ret >= len)
return -ENAMETOOLONG;
return 0;
}
static int rnbd_clt_add_dev_symlink(struct rnbd_clt_dev *dev)
{
struct kobject *gd_kobj = &disk_to_dev(dev->gd)->kobj;
int ret;
ret = rnbd_clt_get_path_name(dev, dev->blk_symlink_name,
sizeof(dev->blk_symlink_name));
if (ret) {
rnbd_clt_err(dev, "Failed to get /sys/block symlink path, err: %d\n",
ret);
goto out_err;
}
ret = sysfs_create_link(rnbd_devs_kobj, gd_kobj,
dev->blk_symlink_name);
if (ret) {
rnbd_clt_err(dev, "Creating /sys/block symlink failed, err: %d\n",
ret);
goto out_err;
}
return 0;
out_err:
dev->blk_symlink_name[0] = '\0';
return ret;
}
static ssize_t rnbd_clt_map_device_store(struct kobject *kobj,
struct kobj_attribute *attr,
const char *buf, size_t count)
{
struct rnbd_clt_dev *dev;
struct rnbd_map_options opt;
int ret;
char pathname[NAME_MAX];
char sessname[NAME_MAX];
enum rnbd_access_mode access_mode = RNBD_ACCESS_RW;
u16 port_nr = RTRS_PORT;
struct sockaddr_storage *addrs;
struct rtrs_addr paths[6];
size_t path_cnt;
opt.sessname = sessname;
opt.paths = paths;
opt.path_cnt = &path_cnt;
opt.pathname = pathname;
opt.dest_port = &port_nr;
opt.access_mode = &access_mode;
addrs = kcalloc(ARRAY_SIZE(paths) * 2, sizeof(*addrs), GFP_KERNEL);
if (!addrs)
return -ENOMEM;
for (path_cnt = 0; path_cnt < ARRAY_SIZE(paths); path_cnt++) {
paths[path_cnt].src = &addrs[path_cnt * 2];
paths[path_cnt].dst = &addrs[path_cnt * 2 + 1];
}
ret = rnbd_clt_parse_map_options(buf, ARRAY_SIZE(paths), &opt);
if (ret)
goto out;
pr_info("Mapping device %s on session %s, (access_mode: %s)\n",
pathname, sessname,
rnbd_access_mode_str(access_mode));
dev = rnbd_clt_map_device(sessname, paths, path_cnt, port_nr, pathname,
access_mode);
if (IS_ERR(dev)) {
ret = PTR_ERR(dev);
goto out;
}
ret = rnbd_clt_add_dev_kobj(dev);
if (ret)
goto unmap_dev;
ret = rnbd_clt_add_dev_symlink(dev);
if (ret)
goto unmap_dev;
kfree(addrs);
return count;
unmap_dev:
rnbd_clt_unmap_device(dev, true, NULL);
out:
kfree(addrs);
return ret;
}
static struct kobj_attribute rnbd_clt_map_device_attr =
__ATTR(map_device, 0644,
rnbd_clt_map_device_show, rnbd_clt_map_device_store);
static struct attribute *default_attrs[] = {
&rnbd_clt_map_device_attr.attr,
NULL,
};
static struct attribute_group default_attr_group = {
.attrs = default_attrs,
};
static const struct attribute_group *default_attr_groups[] = {
&default_attr_group,
NULL,
};
int rnbd_clt_create_sysfs_files(void)
{
int err;
rnbd_dev_class = class_create(THIS_MODULE, "rnbd-client");
if (IS_ERR(rnbd_dev_class))
return PTR_ERR(rnbd_dev_class);
rnbd_dev = device_create_with_groups(rnbd_dev_class, NULL,
MKDEV(0, 0), NULL,
default_attr_groups, "ctl");
if (IS_ERR(rnbd_dev)) {
err = PTR_ERR(rnbd_dev);
goto cls_destroy;
}
rnbd_devs_kobj = kobject_create_and_add("devices", &rnbd_dev->kobj);
if (!rnbd_devs_kobj) {
err = -ENOMEM;
goto dev_destroy;
}
return 0;
dev_destroy:
device_destroy(rnbd_dev_class, MKDEV(0, 0));
cls_destroy:
class_destroy(rnbd_dev_class);
return err;
}
void rnbd_clt_destroy_default_group(void)
{
sysfs_remove_group(&rnbd_dev->kobj, &default_attr_group);
}
void rnbd_clt_destroy_sysfs_files(void)
{
kobject_del(rnbd_devs_kobj);
kobject_put(rnbd_devs_kobj);
device_destroy(rnbd_dev_class, MKDEV(0, 0));
class_destroy(rnbd_dev_class);
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,156 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* RDMA Network Block Driver
*
* Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
* Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
* Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
*/
#ifndef RNBD_CLT_H
#define RNBD_CLT_H
#include <linux/wait.h>
#include <linux/in.h>
#include <linux/inet.h>
#include <linux/blk-mq.h>
#include <linux/refcount.h>
#include <rtrs.h>
#include "rnbd-proto.h"
#include "rnbd-log.h"
/* Max. number of segments per IO request, Mellanox Connect X ~ Connect X5,
* choose minimial 30 for all, minus 1 for internal protocol, so 29.
*/
#define BMAX_SEGMENTS 29
/* time in seconds between reconnect tries, default to 30 s */
#define RECONNECT_DELAY 30
/*
* Number of times to reconnect on error before giving up, 0 for * disabled,
* -1 for forever
*/
#define MAX_RECONNECTS -1
enum rnbd_clt_dev_state {
DEV_STATE_INIT,
DEV_STATE_MAPPED,
DEV_STATE_MAPPED_DISCONNECTED,
DEV_STATE_UNMAPPED,
};
struct rnbd_iu_comp {
wait_queue_head_t wait;
int errno;
};
struct rnbd_iu {
union {
struct request *rq; /* for block io */
void *buf; /* for user messages */
};
struct rtrs_permit *permit;
union {
/* use to send msg associated with a dev */
struct rnbd_clt_dev *dev;
/* use to send msg associated with a sess */
struct rnbd_clt_session *sess;
};
struct scatterlist sglist[BMAX_SEGMENTS];
struct work_struct work;
int errno;
struct rnbd_iu_comp comp;
atomic_t refcount;
};
struct rnbd_cpu_qlist {
struct list_head requeue_list;
spinlock_t requeue_lock;
unsigned int cpu;
};
struct rnbd_clt_session {
struct list_head list;
struct rtrs_clt *rtrs;
wait_queue_head_t rtrs_waitq;
bool rtrs_ready;
struct rnbd_cpu_qlist __percpu
*cpu_queues;
DECLARE_BITMAP(cpu_queues_bm, NR_CPUS);
int __percpu *cpu_rr; /* per-cpu var for CPU round-robin */
atomic_t busy;
int queue_depth;
u32 max_io_size;
struct blk_mq_tag_set tag_set;
struct mutex lock; /* protects state and devs_list */
struct list_head devs_list; /* list of struct rnbd_clt_dev */
refcount_t refcount;
char sessname[NAME_MAX];
u8 ver; /* protocol version */
};
/**
* Submission queues.
*/
struct rnbd_queue {
struct list_head requeue_list;
unsigned long in_list;
struct rnbd_clt_dev *dev;
struct blk_mq_hw_ctx *hctx;
};
struct rnbd_clt_dev {
struct rnbd_clt_session *sess;
struct request_queue *queue;
struct rnbd_queue *hw_queues;
u32 device_id;
/* local Idr index - used to track minor number allocations. */
u32 clt_device_id;
struct mutex lock;
enum rnbd_clt_dev_state dev_state;
char pathname[NAME_MAX];
enum rnbd_access_mode access_mode;
bool read_only;
bool rotational;
u32 max_hw_sectors;
u32 max_write_same_sectors;
u32 max_discard_sectors;
u32 discard_granularity;
u32 discard_alignment;
u16 secure_discard;
u16 physical_block_size;
u16 logical_block_size;
u16 max_segments;
size_t nsectors;
u64 size; /* device size in bytes */
struct list_head list;
struct gendisk *gd;
struct kobject kobj;
char blk_symlink_name[NAME_MAX];
refcount_t refcount;
struct work_struct unmap_on_rmmod_work;
};
/* rnbd-clt.c */
struct rnbd_clt_dev *rnbd_clt_map_device(const char *sessname,
struct rtrs_addr *paths,
size_t path_cnt, u16 port_nr,
const char *pathname,
enum rnbd_access_mode access_mode);
int rnbd_clt_unmap_device(struct rnbd_clt_dev *dev, bool force,
const struct attribute *sysfs_self);
int rnbd_clt_remap_device(struct rnbd_clt_dev *dev);
int rnbd_clt_resize_disk(struct rnbd_clt_dev *dev, size_t newsize);
/* rnbd-clt-sysfs.c */
int rnbd_clt_create_sysfs_files(void);
void rnbd_clt_destroy_sysfs_files(void);
void rnbd_clt_destroy_default_group(void);
void rnbd_clt_remove_dev_symlink(struct rnbd_clt_dev *dev);
#endif /* RNBD_CLT_H */

View File

@ -0,0 +1,23 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* RDMA Network Block Driver
*
* Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
* Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
* Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
*/
#include "rnbd-proto.h"
const char *rnbd_access_mode_str(enum rnbd_access_mode mode)
{
switch (mode) {
case RNBD_ACCESS_RO:
return "ro";
case RNBD_ACCESS_RW:
return "rw";
case RNBD_ACCESS_MIGRATION:
return "migration";
default:
return "unknown";
}
}

View File

@ -0,0 +1,41 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* RDMA Network Block Driver
*
* Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
* Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
* Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
*/
#ifndef RNBD_LOG_H
#define RNBD_LOG_H
#include "rnbd-clt.h"
#include "rnbd-srv.h"
#define rnbd_clt_log(fn, dev, fmt, ...) ( \
fn("<%s@%s> " fmt, (dev)->pathname, \
(dev)->sess->sessname, \
##__VA_ARGS__))
#define rnbd_srv_log(fn, dev, fmt, ...) ( \
fn("<%s@%s>: " fmt, (dev)->pathname, \
(dev)->sess->sessname, ##__VA_ARGS__))
#define rnbd_clt_err(dev, fmt, ...) \
rnbd_clt_log(pr_err, dev, fmt, ##__VA_ARGS__)
#define rnbd_clt_err_rl(dev, fmt, ...) \
rnbd_clt_log(pr_err_ratelimited, dev, fmt, ##__VA_ARGS__)
#define rnbd_clt_info(dev, fmt, ...) \
rnbd_clt_log(pr_info, dev, fmt, ##__VA_ARGS__)
#define rnbd_clt_info_rl(dev, fmt, ...) \
rnbd_clt_log(pr_info_ratelimited, dev, fmt, ##__VA_ARGS__)
#define rnbd_srv_err(dev, fmt, ...) \
rnbd_srv_log(pr_err, dev, fmt, ##__VA_ARGS__)
#define rnbd_srv_err_rl(dev, fmt, ...) \
rnbd_srv_log(pr_err_ratelimited, dev, fmt, ##__VA_ARGS__)
#define rnbd_srv_info(dev, fmt, ...) \
rnbd_srv_log(pr_info, dev, fmt, ##__VA_ARGS__)
#define rnbd_srv_info_rl(dev, fmt, ...) \
rnbd_srv_log(pr_info_ratelimited, dev, fmt, ##__VA_ARGS__)
#endif /* RNBD_LOG_H */

View File

@ -0,0 +1,303 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* RDMA Network Block Driver
*
* Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
* Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
* Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
*/
#ifndef RNBD_PROTO_H
#define RNBD_PROTO_H
#include <linux/types.h>
#include <linux/blkdev.h>
#include <linux/limits.h>
#include <linux/inet.h>
#include <linux/in.h>
#include <linux/in6.h>
#include <rdma/ib.h>
#define RNBD_PROTO_VER_MAJOR 2
#define RNBD_PROTO_VER_MINOR 0
/* The default port number the RTRS server is listening on. */
#define RTRS_PORT 1234
/**
* enum rnbd_msg_types - RNBD message types
* @RNBD_MSG_SESS_INFO: initial session info from client to server
* @RNBD_MSG_SESS_INFO_RSP: initial session info from server to client
* @RNBD_MSG_OPEN: open (map) device request
* @RNBD_MSG_OPEN_RSP: response to an @RNBD_MSG_OPEN
* @RNBD_MSG_IO: block IO request operation
* @RNBD_MSG_CLOSE: close (unmap) device request
*/
enum rnbd_msg_type {
RNBD_MSG_SESS_INFO,
RNBD_MSG_SESS_INFO_RSP,
RNBD_MSG_OPEN,
RNBD_MSG_OPEN_RSP,
RNBD_MSG_IO,
RNBD_MSG_CLOSE,
};
/**
* struct rnbd_msg_hdr - header of RNBD messages
* @type: Message type, valid values see: enum rnbd_msg_types
*/
struct rnbd_msg_hdr {
__le16 type;
__le16 __padding;
};
/**
* We allow to map RO many times and RW only once. We allow to map yet another
* time RW, if MIGRATION is provided (second RW export can be required for
* example for VM migration)
*/
enum rnbd_access_mode {
RNBD_ACCESS_RO,
RNBD_ACCESS_RW,
RNBD_ACCESS_MIGRATION,
};
/**
* struct rnbd_msg_sess_info - initial session info from client to server
* @hdr: message header
* @ver: RNBD protocol version
*/
struct rnbd_msg_sess_info {
struct rnbd_msg_hdr hdr;
u8 ver;
u8 reserved[31];
};
/**
* struct rnbd_msg_sess_info_rsp - initial session info from server to client
* @hdr: message header
* @ver: RNBD protocol version
*/
struct rnbd_msg_sess_info_rsp {
struct rnbd_msg_hdr hdr;
u8 ver;
u8 reserved[31];
};
/**
* struct rnbd_msg_open - request to open a remote device.
* @hdr: message header
* @access_mode: the mode to open remote device, valid values see:
* enum rnbd_access_mode
* @device_name: device path on remote side
*/
struct rnbd_msg_open {
struct rnbd_msg_hdr hdr;
u8 access_mode;
u8 resv1;
s8 dev_name[NAME_MAX];
u8 reserved[3];
};
/**
* struct rnbd_msg_close - request to close a remote device.
* @hdr: message header
* @device_id: device_id on server side to identify the device
*/
struct rnbd_msg_close {
struct rnbd_msg_hdr hdr;
__le32 device_id;
};
/**
* struct rnbd_msg_open_rsp - response message to RNBD_MSG_OPEN
* @hdr: message header
* @device_id: device_id on server side to identify the device
* @nsectors: number of sectors in the usual 512b unit
* @max_hw_sectors: max hardware sectors in the usual 512b unit
* @max_write_same_sectors: max sectors for WRITE SAME in the 512b unit
* @max_discard_sectors: max. sectors that can be discarded at once in 512b
* unit.
* @discard_granularity: size of the internal discard allocation unit in bytes
* @discard_alignment: offset from internal allocation assignment in bytes
* @physical_block_size: physical block size device supports in bytes
* @logical_block_size: logical block size device supports in bytes
* @max_segments: max segments hardware support in one transfer
* @secure_discard: supports secure discard
* @rotation: is a rotational disc?
*/
struct rnbd_msg_open_rsp {
struct rnbd_msg_hdr hdr;
__le32 device_id;
__le64 nsectors;
__le32 max_hw_sectors;
__le32 max_write_same_sectors;
__le32 max_discard_sectors;
__le32 discard_granularity;
__le32 discard_alignment;
__le16 physical_block_size;
__le16 logical_block_size;
__le16 max_segments;
__le16 secure_discard;
u8 rotational;
u8 reserved[11];
};
/**
* struct rnbd_msg_io - message for I/O read/write
* @hdr: message header
* @device_id: device_id on server side to find the right device
* @sector: bi_sector attribute from struct bio
* @rw: valid values are defined in enum rnbd_io_flags
* @bi_size: number of bytes for I/O read/write
* @prio: priority
*/
struct rnbd_msg_io {
struct rnbd_msg_hdr hdr;
__le32 device_id;
__le64 sector;
__le32 rw;
__le32 bi_size;
__le16 prio;
};
#define RNBD_OP_BITS 8
#define RNBD_OP_MASK ((1 << RNBD_OP_BITS) - 1)
/**
* enum rnbd_io_flags - RNBD request types from rq_flag_bits
* @RNBD_OP_READ: read sectors from the device
* @RNBD_OP_WRITE: write sectors to the device
* @RNBD_OP_FLUSH: flush the volatile write cache
* @RNBD_OP_DISCARD: discard sectors
* @RNBD_OP_SECURE_ERASE: securely erase sectors
* @RNBD_OP_WRITE_SAME: write the same sectors many times
* @RNBD_F_SYNC: request is sync (sync write or read)
* @RNBD_F_FUA: forced unit access
*/
enum rnbd_io_flags {
/* Operations */
RNBD_OP_READ = 0,
RNBD_OP_WRITE = 1,
RNBD_OP_FLUSH = 2,
RNBD_OP_DISCARD = 3,
RNBD_OP_SECURE_ERASE = 4,
RNBD_OP_WRITE_SAME = 5,
RNBD_OP_LAST,
/* Flags */
RNBD_F_SYNC = 1<<(RNBD_OP_BITS + 0),
RNBD_F_FUA = 1<<(RNBD_OP_BITS + 1),
RNBD_F_ALL = (RNBD_F_SYNC | RNBD_F_FUA)
};
static inline u32 rnbd_op(u32 flags)
{
return flags & RNBD_OP_MASK;
}
static inline u32 rnbd_flags(u32 flags)
{
return flags & ~RNBD_OP_MASK;
}
static inline bool rnbd_flags_supported(u32 flags)
{
u32 op;
op = rnbd_op(flags);
flags = rnbd_flags(flags);
if (op >= RNBD_OP_LAST)
return false;
if (flags & ~RNBD_F_ALL)
return false;
return true;
}
static inline u32 rnbd_to_bio_flags(u32 rnbd_opf)
{
u32 bio_opf;
switch (rnbd_op(rnbd_opf)) {
case RNBD_OP_READ:
bio_opf = REQ_OP_READ;
break;
case RNBD_OP_WRITE:
bio_opf = REQ_OP_WRITE;
break;
case RNBD_OP_FLUSH:
bio_opf = REQ_OP_FLUSH | REQ_PREFLUSH;
break;
case RNBD_OP_DISCARD:
bio_opf = REQ_OP_DISCARD;
break;
case RNBD_OP_SECURE_ERASE:
bio_opf = REQ_OP_SECURE_ERASE;
break;
case RNBD_OP_WRITE_SAME:
bio_opf = REQ_OP_WRITE_SAME;
break;
default:
WARN(1, "Unknown RNBD type: %d (flags %d)\n",
rnbd_op(rnbd_opf), rnbd_opf);
bio_opf = 0;
}
if (rnbd_opf & RNBD_F_SYNC)
bio_opf |= REQ_SYNC;
if (rnbd_opf & RNBD_F_FUA)
bio_opf |= REQ_FUA;
return bio_opf;
}
static inline u32 rq_to_rnbd_flags(struct request *rq)
{
u32 rnbd_opf;
switch (req_op(rq)) {
case REQ_OP_READ:
rnbd_opf = RNBD_OP_READ;
break;
case REQ_OP_WRITE:
rnbd_opf = RNBD_OP_WRITE;
break;
case REQ_OP_DISCARD:
rnbd_opf = RNBD_OP_DISCARD;
break;
case REQ_OP_SECURE_ERASE:
rnbd_opf = RNBD_OP_SECURE_ERASE;
break;
case REQ_OP_WRITE_SAME:
rnbd_opf = RNBD_OP_WRITE_SAME;
break;
case REQ_OP_FLUSH:
rnbd_opf = RNBD_OP_FLUSH;
break;
default:
WARN(1, "Unknown request type %d (flags %llu)\n",
req_op(rq), (unsigned long long)rq->cmd_flags);
rnbd_opf = 0;
}
if (op_is_sync(rq->cmd_flags))
rnbd_opf |= RNBD_F_SYNC;
if (op_is_flush(rq->cmd_flags))
rnbd_opf |= RNBD_F_FUA;
return rnbd_opf;
}
const char *rnbd_access_mode_str(enum rnbd_access_mode mode);
#endif /* RNBD_PROTO_H */

View File

@ -0,0 +1,134 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* RDMA Network Block Driver
*
* Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
* Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
* Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
*/
#undef pr_fmt
#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
#include "rnbd-srv-dev.h"
#include "rnbd-log.h"
struct rnbd_dev *rnbd_dev_open(const char *path, fmode_t flags,
struct bio_set *bs)
{
struct rnbd_dev *dev;
int ret;
dev = kzalloc(sizeof(*dev), GFP_KERNEL);
if (!dev)
return ERR_PTR(-ENOMEM);
dev->blk_open_flags = flags;
dev->bdev = blkdev_get_by_path(path, flags, THIS_MODULE);
ret = PTR_ERR_OR_ZERO(dev->bdev);
if (ret)
goto err;
dev->blk_open_flags = flags;
bdevname(dev->bdev, dev->name);
dev->ibd_bio_set = bs;
return dev;
err:
kfree(dev);
return ERR_PTR(ret);
}
void rnbd_dev_close(struct rnbd_dev *dev)
{
blkdev_put(dev->bdev, dev->blk_open_flags);
kfree(dev);
}
static void rnbd_dev_bi_end_io(struct bio *bio)
{
struct rnbd_dev_blk_io *io = bio->bi_private;
rnbd_endio(io->priv, blk_status_to_errno(bio->bi_status));
bio_put(bio);
}
/**
* rnbd_bio_map_kern - map kernel address into bio
* @data: pointer to buffer to map
* @bs: bio_set to use.
* @len: length in bytes
* @gfp_mask: allocation flags for bio allocation
*
* Map the kernel address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
static struct bio *rnbd_bio_map_kern(void *data, struct bio_set *bs,
unsigned int len, gfp_t gfp_mask)
{
unsigned long kaddr = (unsigned long)data;
unsigned long end = (kaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
unsigned long start = kaddr >> PAGE_SHIFT;
const int nr_pages = end - start;
int offset, i;
struct bio *bio;
bio = bio_alloc_bioset(gfp_mask, nr_pages, bs);
if (!bio)
return ERR_PTR(-ENOMEM);
offset = offset_in_page(kaddr);
for (i = 0; i < nr_pages; i++) {
unsigned int bytes = PAGE_SIZE - offset;
if (len <= 0)
break;
if (bytes > len)
bytes = len;
if (bio_add_page(bio, virt_to_page(data), bytes,
offset) < bytes) {
/* we don't support partial mappings */
bio_put(bio);
return ERR_PTR(-EINVAL);
}
data += bytes;
len -= bytes;
offset = 0;
}
bio->bi_end_io = bio_put;
return bio;
}
int rnbd_dev_submit_io(struct rnbd_dev *dev, sector_t sector, void *data,
size_t len, u32 bi_size, enum rnbd_io_flags flags,
short prio, void *priv)
{
struct rnbd_dev_blk_io *io;
struct bio *bio;
/* Generate bio with pages pointing to the rdma buffer */
bio = rnbd_bio_map_kern(data, dev->ibd_bio_set, len, GFP_KERNEL);
if (IS_ERR(bio))
return PTR_ERR(bio);
io = container_of(bio, struct rnbd_dev_blk_io, bio);
io->dev = dev;
io->priv = priv;
bio->bi_end_io = rnbd_dev_bi_end_io;
bio->bi_private = io;
bio->bi_opf = rnbd_to_bio_flags(flags);
bio->bi_iter.bi_sector = sector;
bio->bi_iter.bi_size = bi_size;
bio_set_prio(bio, prio);
bio_set_dev(bio, dev->bdev);
submit_bio(bio);
return 0;
}

View File

@ -0,0 +1,92 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* RDMA Network Block Driver
*
* Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
* Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
* Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
*/
#ifndef RNBD_SRV_DEV_H
#define RNBD_SRV_DEV_H
#include <linux/fs.h>
#include "rnbd-proto.h"
struct rnbd_dev {
struct block_device *bdev;
struct bio_set *ibd_bio_set;
fmode_t blk_open_flags;
char name[BDEVNAME_SIZE];
};
struct rnbd_dev_blk_io {
struct rnbd_dev *dev;
void *priv;
/* have to be last member for front_pad usage of bioset_init */
struct bio bio;
};
/**
* rnbd_dev_open() - Open a device
* @flags: open flags
* @bs: bio_set to use during block io,
*/
struct rnbd_dev *rnbd_dev_open(const char *path, fmode_t flags,
struct bio_set *bs);
/**
* rnbd_dev_close() - Close a device
*/
void rnbd_dev_close(struct rnbd_dev *dev);
void rnbd_endio(void *priv, int error);
static inline int rnbd_dev_get_max_segs(const struct rnbd_dev *dev)
{
return queue_max_segments(bdev_get_queue(dev->bdev));
}
static inline int rnbd_dev_get_max_hw_sects(const struct rnbd_dev *dev)
{
return queue_max_hw_sectors(bdev_get_queue(dev->bdev));
}
static inline int rnbd_dev_get_secure_discard(const struct rnbd_dev *dev)
{
return blk_queue_secure_erase(bdev_get_queue(dev->bdev));
}
static inline int rnbd_dev_get_max_discard_sects(const struct rnbd_dev *dev)
{
if (!blk_queue_discard(bdev_get_queue(dev->bdev)))
return 0;
return blk_queue_get_max_sectors(bdev_get_queue(dev->bdev),
REQ_OP_DISCARD);
}
static inline int rnbd_dev_get_discard_granularity(const struct rnbd_dev *dev)
{
return bdev_get_queue(dev->bdev)->limits.discard_granularity;
}
static inline int rnbd_dev_get_discard_alignment(const struct rnbd_dev *dev)
{
return bdev_get_queue(dev->bdev)->limits.discard_alignment;
}
/**
* rnbd_dev_submit_io() - Submit an I/O to the disk
* @dev: device to that the I/O is submitted
* @sector: address to read/write data to
* @data: I/O data to write or buffer to read I/O date into
* @len: length of @data
* @bi_size: Amount of data that will be read/written
* @prio: IO priority
* @priv: private data passed to @io_fn
*/
int rnbd_dev_submit_io(struct rnbd_dev *dev, sector_t sector, void *data,
size_t len, u32 bi_size, enum rnbd_io_flags flags,
short prio, void *priv);
#endif /* RNBD_SRV_DEV_H */

View File

@ -0,0 +1,215 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* RDMA Network Block Driver
*
* Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
* Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
* Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
*/
#undef pr_fmt
#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
#include <uapi/linux/limits.h>
#include <linux/kobject.h>
#include <linux/sysfs.h>
#include <linux/stat.h>
#include <linux/genhd.h>
#include <linux/list.h>
#include <linux/moduleparam.h>
#include <linux/device.h>
#include "rnbd-srv.h"
static struct device *rnbd_dev;
static struct class *rnbd_dev_class;
static struct kobject *rnbd_devs_kobj;
static void rnbd_srv_dev_release(struct kobject *kobj)
{
struct rnbd_srv_dev *dev;
dev = container_of(kobj, struct rnbd_srv_dev, dev_kobj);
kfree(dev);
}
static struct kobj_type dev_ktype = {
.sysfs_ops = &kobj_sysfs_ops,
.release = rnbd_srv_dev_release
};
int rnbd_srv_create_dev_sysfs(struct rnbd_srv_dev *dev,
struct block_device *bdev,
const char *dev_name)
{
struct kobject *bdev_kobj;
int ret;
ret = kobject_init_and_add(&dev->dev_kobj, &dev_ktype,
rnbd_devs_kobj, dev_name);
if (ret)
return ret;
dev->dev_sessions_kobj = kobject_create_and_add("sessions",
&dev->dev_kobj);
if (!dev->dev_sessions_kobj)
goto put_dev_kobj;
bdev_kobj = &disk_to_dev(bdev->bd_disk)->kobj;
ret = sysfs_create_link(&dev->dev_kobj, bdev_kobj, "block_dev");
if (ret)
goto put_sess_kobj;
return 0;
put_sess_kobj:
kobject_put(dev->dev_sessions_kobj);
put_dev_kobj:
kobject_put(&dev->dev_kobj);
return ret;
}
void rnbd_srv_destroy_dev_sysfs(struct rnbd_srv_dev *dev)
{
sysfs_remove_link(&dev->dev_kobj, "block_dev");
kobject_del(dev->dev_sessions_kobj);
kobject_put(dev->dev_sessions_kobj);
kobject_del(&dev->dev_kobj);
kobject_put(&dev->dev_kobj);
}
static ssize_t read_only_show(struct kobject *kobj, struct kobj_attribute *attr,
char *page)
{
struct rnbd_srv_sess_dev *sess_dev;
sess_dev = container_of(kobj, struct rnbd_srv_sess_dev, kobj);
return scnprintf(page, PAGE_SIZE, "%d\n",
!(sess_dev->open_flags & FMODE_WRITE));
}
static struct kobj_attribute rnbd_srv_dev_session_ro_attr =
__ATTR_RO(read_only);
static ssize_t access_mode_show(struct kobject *kobj,
struct kobj_attribute *attr,
char *page)
{
struct rnbd_srv_sess_dev *sess_dev;
sess_dev = container_of(kobj, struct rnbd_srv_sess_dev, kobj);
return scnprintf(page, PAGE_SIZE, "%s\n",
rnbd_access_mode_str(sess_dev->access_mode));
}
static struct kobj_attribute rnbd_srv_dev_session_access_mode_attr =
__ATTR_RO(access_mode);
static ssize_t mapping_path_show(struct kobject *kobj,
struct kobj_attribute *attr, char *page)
{
struct rnbd_srv_sess_dev *sess_dev;
sess_dev = container_of(kobj, struct rnbd_srv_sess_dev, kobj);
return scnprintf(page, PAGE_SIZE, "%s\n", sess_dev->pathname);
}
static struct kobj_attribute rnbd_srv_dev_session_mapping_path_attr =
__ATTR_RO(mapping_path);
static struct attribute *rnbd_srv_default_dev_sessions_attrs[] = {
&rnbd_srv_dev_session_access_mode_attr.attr,
&rnbd_srv_dev_session_ro_attr.attr,
&rnbd_srv_dev_session_mapping_path_attr.attr,
NULL,
};
static struct attribute_group rnbd_srv_default_dev_session_attr_group = {
.attrs = rnbd_srv_default_dev_sessions_attrs,
};
void rnbd_srv_destroy_dev_session_sysfs(struct rnbd_srv_sess_dev *sess_dev)
{
sysfs_remove_group(&sess_dev->kobj,
&rnbd_srv_default_dev_session_attr_group);
kobject_del(&sess_dev->kobj);
kobject_put(&sess_dev->kobj);
}
static void rnbd_srv_sess_dev_release(struct kobject *kobj)
{
struct rnbd_srv_sess_dev *sess_dev;
sess_dev = container_of(kobj, struct rnbd_srv_sess_dev, kobj);
rnbd_destroy_sess_dev(sess_dev);
}
static struct kobj_type rnbd_srv_sess_dev_ktype = {
.sysfs_ops = &kobj_sysfs_ops,
.release = rnbd_srv_sess_dev_release,
};
int rnbd_srv_create_dev_session_sysfs(struct rnbd_srv_sess_dev *sess_dev)
{
int ret;
ret = kobject_init_and_add(&sess_dev->kobj, &rnbd_srv_sess_dev_ktype,
sess_dev->dev->dev_sessions_kobj, "%s",
sess_dev->sess->sessname);
if (ret)
return ret;
ret = sysfs_create_group(&sess_dev->kobj,
&rnbd_srv_default_dev_session_attr_group);
if (ret)
goto err;
return 0;
err:
kobject_put(&sess_dev->kobj);
return ret;
}
int rnbd_srv_create_sysfs_files(void)
{
int err;
rnbd_dev_class = class_create(THIS_MODULE, "rnbd-server");
if (IS_ERR(rnbd_dev_class))
return PTR_ERR(rnbd_dev_class);
rnbd_dev = device_create(rnbd_dev_class, NULL,
MKDEV(0, 0), NULL, "ctl");
if (IS_ERR(rnbd_dev)) {
err = PTR_ERR(rnbd_dev);
goto cls_destroy;
}
rnbd_devs_kobj = kobject_create_and_add("devices", &rnbd_dev->kobj);
if (!rnbd_devs_kobj) {
err = -ENOMEM;
goto dev_destroy;
}
return 0;
dev_destroy:
device_destroy(rnbd_dev_class, MKDEV(0, 0));
cls_destroy:
class_destroy(rnbd_dev_class);
return err;
}
void rnbd_srv_destroy_sysfs_files(void)
{
kobject_del(rnbd_devs_kobj);
kobject_put(rnbd_devs_kobj);
device_destroy(rnbd_dev_class, MKDEV(0, 0));
class_destroy(rnbd_dev_class);
}

View File

@ -0,0 +1,844 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* RDMA Network Block Driver
*
* Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
* Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
* Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
*/
#undef pr_fmt
#define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
#include <linux/module.h>
#include <linux/blkdev.h>
#include "rnbd-srv.h"
#include "rnbd-srv-dev.h"
MODULE_DESCRIPTION("RDMA Network Block Device Server");
MODULE_LICENSE("GPL");
static u16 port_nr = RTRS_PORT;
module_param_named(port_nr, port_nr, ushort, 0444);
MODULE_PARM_DESC(port_nr,
"The port number the server is listening on (default: "
__stringify(RTRS_PORT)")");
#define DEFAULT_DEV_SEARCH_PATH "/"
static char dev_search_path[PATH_MAX] = DEFAULT_DEV_SEARCH_PATH;
static int dev_search_path_set(const char *val, const struct kernel_param *kp)
{
const char *p = strrchr(val, '\n') ? : val + strlen(val);
if (strlen(val) >= sizeof(dev_search_path))
return -EINVAL;
snprintf(dev_search_path, sizeof(dev_search_path), "%.*s",
(int)(p - val), val);
pr_info("dev_search_path changed to '%s'\n", dev_search_path);
return 0;
}
static struct kparam_string dev_search_path_kparam_str = {
.maxlen = sizeof(dev_search_path),
.string = dev_search_path
};
static const struct kernel_param_ops dev_search_path_ops = {
.set = dev_search_path_set,
.get = param_get_string,
};
module_param_cb(dev_search_path, &dev_search_path_ops,
&dev_search_path_kparam_str, 0444);
MODULE_PARM_DESC(dev_search_path,
"Sets the dev_search_path. When a device is mapped this path is prepended to the device path from the map device operation. If %SESSNAME% is specified in a path, then device will be searched in a session namespace. (default: "
DEFAULT_DEV_SEARCH_PATH ")");
static DEFINE_MUTEX(sess_lock);
static DEFINE_SPINLOCK(dev_lock);
static LIST_HEAD(sess_list);
static LIST_HEAD(dev_list);
struct rnbd_io_private {
struct rtrs_srv_op *id;
struct rnbd_srv_sess_dev *sess_dev;
};
static void rnbd_sess_dev_release(struct kref *kref)
{
struct rnbd_srv_sess_dev *sess_dev;
sess_dev = container_of(kref, struct rnbd_srv_sess_dev, kref);
complete(sess_dev->destroy_comp);
}
static inline void rnbd_put_sess_dev(struct rnbd_srv_sess_dev *sess_dev)
{
kref_put(&sess_dev->kref, rnbd_sess_dev_release);
}
void rnbd_endio(void *priv, int error)
{
struct rnbd_io_private *rnbd_priv = priv;
struct rnbd_srv_sess_dev *sess_dev = rnbd_priv->sess_dev;
rnbd_put_sess_dev(sess_dev);
rtrs_srv_resp_rdma(rnbd_priv->id, error);
kfree(priv);
}
static struct rnbd_srv_sess_dev *
rnbd_get_sess_dev(int dev_id, struct rnbd_srv_session *srv_sess)
{
struct rnbd_srv_sess_dev *sess_dev;
int ret = 0;
rcu_read_lock();
sess_dev = xa_load(&srv_sess->index_idr, dev_id);
if (likely(sess_dev))
ret = kref_get_unless_zero(&sess_dev->kref);
rcu_read_unlock();
if (!sess_dev || !ret)
return ERR_PTR(-ENXIO);
return sess_dev;
}
static int process_rdma(struct rtrs_srv *sess,
struct rnbd_srv_session *srv_sess,
struct rtrs_srv_op *id, void *data, u32 datalen,
const void *usr, size_t usrlen)
{
const struct rnbd_msg_io *msg = usr;
struct rnbd_io_private *priv;
struct rnbd_srv_sess_dev *sess_dev;
u32 dev_id;
int err;
priv = kmalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
dev_id = le32_to_cpu(msg->device_id);
sess_dev = rnbd_get_sess_dev(dev_id, srv_sess);
if (IS_ERR(sess_dev)) {
pr_err_ratelimited("Got I/O request on session %s for unknown device id %d\n",
srv_sess->sessname, dev_id);
err = -ENOTCONN;
goto err;
}
priv->sess_dev = sess_dev;
priv->id = id;
err = rnbd_dev_submit_io(sess_dev->rnbd_dev, le64_to_cpu(msg->sector),
data, datalen, le32_to_cpu(msg->bi_size),
le32_to_cpu(msg->rw),
srv_sess->ver < RNBD_PROTO_VER_MAJOR ||
usrlen < sizeof(*msg) ?
0 : le16_to_cpu(msg->prio), priv);
if (unlikely(err)) {
rnbd_srv_err(sess_dev, "Submitting I/O to device failed, err: %d\n",
err);
goto sess_dev_put;
}
return 0;
sess_dev_put:
rnbd_put_sess_dev(sess_dev);
err:
kfree(priv);
return err;
}
static void destroy_device(struct rnbd_srv_dev *dev)
{
WARN_ONCE(!list_empty(&dev->sess_dev_list),
"Device %s is being destroyed but still in use!\n",
dev->id);
spin_lock(&dev_lock);
list_del(&dev->list);
spin_unlock(&dev_lock);
mutex_destroy(&dev->lock);
if (dev->dev_kobj.state_in_sysfs)
/*
* Destroy kobj only if it was really created.
*/
rnbd_srv_destroy_dev_sysfs(dev);
else
kfree(dev);
}
static void destroy_device_cb(struct kref *kref)
{
struct rnbd_srv_dev *dev;
dev = container_of(kref, struct rnbd_srv_dev, kref);
destroy_device(dev);
}
static void rnbd_put_srv_dev(struct rnbd_srv_dev *dev)
{
kref_put(&dev->kref, destroy_device_cb);
}
void rnbd_destroy_sess_dev(struct rnbd_srv_sess_dev *sess_dev)
{
DECLARE_COMPLETION_ONSTACK(dc);
xa_erase(&sess_dev->sess->index_idr, sess_dev->device_id);
synchronize_rcu();
sess_dev->destroy_comp = &dc;
rnbd_put_sess_dev(sess_dev);
wait_for_completion(&dc); /* wait for inflights to drop to zero */
rnbd_dev_close(sess_dev->rnbd_dev);
list_del(&sess_dev->sess_list);
mutex_lock(&sess_dev->dev->lock);
list_del(&sess_dev->dev_list);
if (sess_dev->open_flags & FMODE_WRITE)
sess_dev->dev->open_write_cnt--;
mutex_unlock(&sess_dev->dev->lock);
rnbd_put_srv_dev(sess_dev->dev);
rnbd_srv_info(sess_dev, "Device closed\n");
kfree(sess_dev);
}
static void destroy_sess(struct rnbd_srv_session *srv_sess)
{
struct rnbd_srv_sess_dev *sess_dev, *tmp;
if (list_empty(&srv_sess->sess_dev_list))
goto out;
mutex_lock(&srv_sess->lock);
list_for_each_entry_safe(sess_dev, tmp, &srv_sess->sess_dev_list,
sess_list)
rnbd_srv_destroy_dev_session_sysfs(sess_dev);
mutex_unlock(&srv_sess->lock);
out:
xa_destroy(&srv_sess->index_idr);
bioset_exit(&srv_sess->sess_bio_set);
pr_info("RTRS Session %s disconnected\n", srv_sess->sessname);
mutex_lock(&sess_lock);
list_del(&srv_sess->list);
mutex_unlock(&sess_lock);
mutex_destroy(&srv_sess->lock);
kfree(srv_sess);
}
static int create_sess(struct rtrs_srv *rtrs)
{
struct rnbd_srv_session *srv_sess;
char sessname[NAME_MAX];
int err;
err = rtrs_srv_get_sess_name(rtrs, sessname, sizeof(sessname));
if (err) {
pr_err("rtrs_srv_get_sess_name(%s): %d\n", sessname, err);
return err;
}
srv_sess = kzalloc(sizeof(*srv_sess), GFP_KERNEL);
if (!srv_sess)
return -ENOMEM;
srv_sess->queue_depth = rtrs_srv_get_queue_depth(rtrs);
err = bioset_init(&srv_sess->sess_bio_set, srv_sess->queue_depth,
offsetof(struct rnbd_dev_blk_io, bio),
BIOSET_NEED_BVECS);
if (err) {
pr_err("Allocating srv_session for session %s failed\n",
sessname);
kfree(srv_sess);
return err;
}
xa_init_flags(&srv_sess->index_idr, XA_FLAGS_ALLOC);
INIT_LIST_HEAD(&srv_sess->sess_dev_list);
mutex_init(&srv_sess->lock);
mutex_lock(&sess_lock);
list_add(&srv_sess->list, &sess_list);
mutex_unlock(&sess_lock);
srv_sess->rtrs = rtrs;
strlcpy(srv_sess->sessname, sessname, sizeof(srv_sess->sessname));
rtrs_srv_set_sess_priv(rtrs, srv_sess);
return 0;
}
static int rnbd_srv_link_ev(struct rtrs_srv *rtrs,
enum rtrs_srv_link_ev ev, void *priv)
{
struct rnbd_srv_session *srv_sess = priv;
switch (ev) {
case RTRS_SRV_LINK_EV_CONNECTED:
return create_sess(rtrs);
case RTRS_SRV_LINK_EV_DISCONNECTED:
if (WARN_ON_ONCE(!srv_sess))
return -EINVAL;
destroy_sess(srv_sess);
return 0;
default:
pr_warn("Received unknown RTRS session event %d from session %s\n",
ev, srv_sess->sessname);
return -EINVAL;
}
}
static int process_msg_close(struct rtrs_srv *rtrs,
struct rnbd_srv_session *srv_sess,
void *data, size_t datalen, const void *usr,
size_t usrlen)
{
const struct rnbd_msg_close *close_msg = usr;
struct rnbd_srv_sess_dev *sess_dev;
sess_dev = rnbd_get_sess_dev(le32_to_cpu(close_msg->device_id),
srv_sess);
if (IS_ERR(sess_dev))
return 0;
rnbd_put_sess_dev(sess_dev);
mutex_lock(&srv_sess->lock);
rnbd_srv_destroy_dev_session_sysfs(sess_dev);
mutex_unlock(&srv_sess->lock);
return 0;
}
static int process_msg_open(struct rtrs_srv *rtrs,
struct rnbd_srv_session *srv_sess,
const void *msg, size_t len,
void *data, size_t datalen);
static int process_msg_sess_info(struct rtrs_srv *rtrs,
struct rnbd_srv_session *srv_sess,
const void *msg, size_t len,
void *data, size_t datalen);
static int rnbd_srv_rdma_ev(struct rtrs_srv *rtrs, void *priv,
struct rtrs_srv_op *id, int dir,
void *data, size_t datalen, const void *usr,
size_t usrlen)
{
struct rnbd_srv_session *srv_sess = priv;
const struct rnbd_msg_hdr *hdr = usr;
int ret = 0;
u16 type;
if (WARN_ON_ONCE(!srv_sess))
return -ENODEV;
type = le16_to_cpu(hdr->type);
switch (type) {
case RNBD_MSG_IO:
return process_rdma(rtrs, srv_sess, id, data, datalen, usr,
usrlen);
case RNBD_MSG_CLOSE:
ret = process_msg_close(rtrs, srv_sess, data, datalen,
usr, usrlen);
break;
case RNBD_MSG_OPEN:
ret = process_msg_open(rtrs, srv_sess, usr, usrlen,
data, datalen);
break;
case RNBD_MSG_SESS_INFO:
ret = process_msg_sess_info(rtrs, srv_sess, usr, usrlen,
data, datalen);
break;
default:
pr_warn("Received unexpected message type %d with dir %d from session %s\n",
type, dir, srv_sess->sessname);
return -EINVAL;
}
rtrs_srv_resp_rdma(id, ret);
return 0;
}
static struct rnbd_srv_sess_dev
*rnbd_sess_dev_alloc(struct rnbd_srv_session *srv_sess)
{
struct rnbd_srv_sess_dev *sess_dev;
int error;
sess_dev = kzalloc(sizeof(*sess_dev), GFP_KERNEL);
if (!sess_dev)
return ERR_PTR(-ENOMEM);
error = xa_alloc(&srv_sess->index_idr, &sess_dev->device_id, sess_dev,
xa_limit_32b, GFP_NOWAIT);
if (error < 0) {
pr_warn("Allocating idr failed, err: %d\n", error);
kfree(sess_dev);
return ERR_PTR(error);
}
return sess_dev;
}
static struct rnbd_srv_dev *rnbd_srv_init_srv_dev(const char *id)
{
struct rnbd_srv_dev *dev;
dev = kzalloc(sizeof(*dev), GFP_KERNEL);
if (!dev)
return ERR_PTR(-ENOMEM);
strlcpy(dev->id, id, sizeof(dev->id));
kref_init(&dev->kref);
INIT_LIST_HEAD(&dev->sess_dev_list);
mutex_init(&dev->lock);
return dev;
}
static struct rnbd_srv_dev *
rnbd_srv_find_or_add_srv_dev(struct rnbd_srv_dev *new_dev)
{
struct rnbd_srv_dev *dev;
spin_lock(&dev_lock);
list_for_each_entry(dev, &dev_list, list) {
if (!strncmp(dev->id, new_dev->id, sizeof(dev->id))) {
if (!kref_get_unless_zero(&dev->kref))
/*
* We lost the race, device is almost dead.
* Continue traversing to find a valid one.
*/
continue;
spin_unlock(&dev_lock);
return dev;
}
}
list_add(&new_dev->list, &dev_list);
spin_unlock(&dev_lock);
return new_dev;
}
static int rnbd_srv_check_update_open_perm(struct rnbd_srv_dev *srv_dev,
struct rnbd_srv_session *srv_sess,
enum rnbd_access_mode access_mode)
{
int ret = -EPERM;
mutex_lock(&srv_dev->lock);
switch (access_mode) {
case RNBD_ACCESS_RO:
ret = 0;
break;
case RNBD_ACCESS_RW:
if (srv_dev->open_write_cnt == 0) {
srv_dev->open_write_cnt++;
ret = 0;
} else {
pr_err("Mapping device '%s' for session %s with RW permissions failed. Device already opened as 'RW' by %d client(s), access mode %s.\n",
srv_dev->id, srv_sess->sessname,
srv_dev->open_write_cnt,
rnbd_access_mode_str(access_mode));
}
break;
case RNBD_ACCESS_MIGRATION:
if (srv_dev->open_write_cnt < 2) {
srv_dev->open_write_cnt++;
ret = 0;
} else {
pr_err("Mapping device '%s' for session %s with migration permissions failed. Device already opened as 'RW' by %d client(s), access mode %s.\n",
srv_dev->id, srv_sess->sessname,
srv_dev->open_write_cnt,
rnbd_access_mode_str(access_mode));
}
break;
default:
pr_err("Received mapping request for device '%s' on session %s with invalid access mode: %d\n",
srv_dev->id, srv_sess->sessname, access_mode);
ret = -EINVAL;
}
mutex_unlock(&srv_dev->lock);
return ret;
}
static struct rnbd_srv_dev *
rnbd_srv_get_or_create_srv_dev(struct rnbd_dev *rnbd_dev,
struct rnbd_srv_session *srv_sess,
enum rnbd_access_mode access_mode)
{
int ret;
struct rnbd_srv_dev *new_dev, *dev;
new_dev = rnbd_srv_init_srv_dev(rnbd_dev->name);
if (IS_ERR(new_dev))
return new_dev;
dev = rnbd_srv_find_or_add_srv_dev(new_dev);
if (dev != new_dev)
kfree(new_dev);
ret = rnbd_srv_check_update_open_perm(dev, srv_sess, access_mode);
if (ret) {
rnbd_put_srv_dev(dev);
return ERR_PTR(ret);
}
return dev;
}
static void rnbd_srv_fill_msg_open_rsp(struct rnbd_msg_open_rsp *rsp,
struct rnbd_srv_sess_dev *sess_dev)
{
struct rnbd_dev *rnbd_dev = sess_dev->rnbd_dev;
rsp->hdr.type = cpu_to_le16(RNBD_MSG_OPEN_RSP);
rsp->device_id =
cpu_to_le32(sess_dev->device_id);
rsp->nsectors =
cpu_to_le64(get_capacity(rnbd_dev->bdev->bd_disk));
rsp->logical_block_size =
cpu_to_le16(bdev_logical_block_size(rnbd_dev->bdev));
rsp->physical_block_size =
cpu_to_le16(bdev_physical_block_size(rnbd_dev->bdev));
rsp->max_segments =
cpu_to_le16(rnbd_dev_get_max_segs(rnbd_dev));
rsp->max_hw_sectors =
cpu_to_le32(rnbd_dev_get_max_hw_sects(rnbd_dev));
rsp->max_write_same_sectors =
cpu_to_le32(bdev_write_same(rnbd_dev->bdev));
rsp->max_discard_sectors =
cpu_to_le32(rnbd_dev_get_max_discard_sects(rnbd_dev));
rsp->discard_granularity =
cpu_to_le32(rnbd_dev_get_discard_granularity(rnbd_dev));
rsp->discard_alignment =
cpu_to_le32(rnbd_dev_get_discard_alignment(rnbd_dev));
rsp->secure_discard =
cpu_to_le16(rnbd_dev_get_secure_discard(rnbd_dev));
rsp->rotational =
!blk_queue_nonrot(bdev_get_queue(rnbd_dev->bdev));
}
static struct rnbd_srv_sess_dev *
rnbd_srv_create_set_sess_dev(struct rnbd_srv_session *srv_sess,
const struct rnbd_msg_open *open_msg,
struct rnbd_dev *rnbd_dev, fmode_t open_flags,
struct rnbd_srv_dev *srv_dev)
{
struct rnbd_srv_sess_dev *sdev = rnbd_sess_dev_alloc(srv_sess);
if (IS_ERR(sdev))
return sdev;
kref_init(&sdev->kref);
strlcpy(sdev->pathname, open_msg->dev_name, sizeof(sdev->pathname));
sdev->rnbd_dev = rnbd_dev;
sdev->sess = srv_sess;
sdev->dev = srv_dev;
sdev->open_flags = open_flags;
sdev->access_mode = open_msg->access_mode;
return sdev;
}
static char *rnbd_srv_get_full_path(struct rnbd_srv_session *srv_sess,
const char *dev_name)
{
char *full_path;
char *a, *b;
full_path = kmalloc(PATH_MAX, GFP_KERNEL);
if (!full_path)
return ERR_PTR(-ENOMEM);
/*
* Replace %SESSNAME% with a real session name in order to
* create device namespace.
*/
a = strnstr(dev_search_path, "%SESSNAME%", sizeof(dev_search_path));
if (a) {
int len = a - dev_search_path;
len = snprintf(full_path, PATH_MAX, "%.*s/%s/%s", len,
dev_search_path, srv_sess->sessname, dev_name);
if (len >= PATH_MAX) {
pr_err("Too long path: %s, %s, %s\n",
dev_search_path, srv_sess->sessname, dev_name);
kfree(full_path);
return ERR_PTR(-EINVAL);
}
} else {
snprintf(full_path, PATH_MAX, "%s/%s",
dev_search_path, dev_name);
}
/* eliminitate duplicated slashes */
a = strchr(full_path, '/');
b = a;
while (*b != '\0') {
if (*b == '/' && *a == '/') {
b++;
} else {
a++;
*a = *b;
b++;
}
}
a++;
*a = '\0';
return full_path;
}
static int process_msg_sess_info(struct rtrs_srv *rtrs,
struct rnbd_srv_session *srv_sess,
const void *msg, size_t len,
void *data, size_t datalen)
{
const struct rnbd_msg_sess_info *sess_info_msg = msg;
struct rnbd_msg_sess_info_rsp *rsp = data;
srv_sess->ver = min_t(u8, sess_info_msg->ver, RNBD_PROTO_VER_MAJOR);
pr_debug("Session %s using protocol version %d (client version: %d, server version: %d)\n",
srv_sess->sessname, srv_sess->ver,
sess_info_msg->ver, RNBD_PROTO_VER_MAJOR);
rsp->hdr.type = cpu_to_le16(RNBD_MSG_SESS_INFO_RSP);
rsp->ver = srv_sess->ver;
return 0;
}
/**
* find_srv_sess_dev() - a dev is already opened by this name
* @srv_sess: the session to search.
* @dev_name: string containing the name of the device.
*
* Return struct rnbd_srv_sess_dev if srv_sess already opened the dev_name
* NULL if the session didn't open the device yet.
*/
static struct rnbd_srv_sess_dev *
find_srv_sess_dev(struct rnbd_srv_session *srv_sess, const char *dev_name)
{
struct rnbd_srv_sess_dev *sess_dev;
if (list_empty(&srv_sess->sess_dev_list))
return NULL;
list_for_each_entry(sess_dev, &srv_sess->sess_dev_list, sess_list)
if (!strcmp(sess_dev->pathname, dev_name))
return sess_dev;
return NULL;
}
static int process_msg_open(struct rtrs_srv *rtrs,
struct rnbd_srv_session *srv_sess,
const void *msg, size_t len,
void *data, size_t datalen)
{
int ret;
struct rnbd_srv_dev *srv_dev;
struct rnbd_srv_sess_dev *srv_sess_dev;
const struct rnbd_msg_open *open_msg = msg;
fmode_t open_flags;
char *full_path;
struct rnbd_dev *rnbd_dev;
struct rnbd_msg_open_rsp *rsp = data;
pr_debug("Open message received: session='%s' path='%s' access_mode=%d\n",
srv_sess->sessname, open_msg->dev_name,
open_msg->access_mode);
open_flags = FMODE_READ;
if (open_msg->access_mode != RNBD_ACCESS_RO)
open_flags |= FMODE_WRITE;
mutex_lock(&srv_sess->lock);
srv_sess_dev = find_srv_sess_dev(srv_sess, open_msg->dev_name);
if (srv_sess_dev)
goto fill_response;
if ((strlen(dev_search_path) + strlen(open_msg->dev_name))
>= PATH_MAX) {
pr_err("Opening device for session %s failed, device path too long. '%s/%s' is longer than PATH_MAX (%d)\n",
srv_sess->sessname, dev_search_path, open_msg->dev_name,
PATH_MAX);
ret = -EINVAL;
goto reject;
}
if (strstr(open_msg->dev_name, "..")) {
pr_err("Opening device for session %s failed, device path %s contains relative path ..\n",
srv_sess->sessname, open_msg->dev_name);
ret = -EINVAL;
goto reject;
}
full_path = rnbd_srv_get_full_path(srv_sess, open_msg->dev_name);
if (IS_ERR(full_path)) {
ret = PTR_ERR(full_path);
pr_err("Opening device '%s' for client %s failed, failed to get device full path, err: %d\n",
open_msg->dev_name, srv_sess->sessname, ret);
goto reject;
}
rnbd_dev = rnbd_dev_open(full_path, open_flags,
&srv_sess->sess_bio_set);
if (IS_ERR(rnbd_dev)) {
pr_err("Opening device '%s' on session %s failed, failed to open the block device, err: %ld\n",
full_path, srv_sess->sessname, PTR_ERR(rnbd_dev));
ret = PTR_ERR(rnbd_dev);
goto free_path;
}
srv_dev = rnbd_srv_get_or_create_srv_dev(rnbd_dev, srv_sess,
open_msg->access_mode);
if (IS_ERR(srv_dev)) {
pr_err("Opening device '%s' on session %s failed, creating srv_dev failed, err: %ld\n",
full_path, srv_sess->sessname, PTR_ERR(srv_dev));
ret = PTR_ERR(srv_dev);
goto rnbd_dev_close;
}
srv_sess_dev = rnbd_srv_create_set_sess_dev(srv_sess, open_msg,
rnbd_dev, open_flags,
srv_dev);
if (IS_ERR(srv_sess_dev)) {
pr_err("Opening device '%s' on session %s failed, creating sess_dev failed, err: %ld\n",
full_path, srv_sess->sessname, PTR_ERR(srv_sess_dev));
ret = PTR_ERR(srv_sess_dev);
goto srv_dev_put;
}
/* Create the srv_dev sysfs files if they haven't been created yet. The
* reason to delay the creation is not to create the sysfs files before
* we are sure the device can be opened.
*/
mutex_lock(&srv_dev->lock);
if (!srv_dev->dev_kobj.state_in_sysfs) {
ret = rnbd_srv_create_dev_sysfs(srv_dev, rnbd_dev->bdev,
rnbd_dev->name);
if (ret) {
mutex_unlock(&srv_dev->lock);
rnbd_srv_err(srv_sess_dev,
"Opening device failed, failed to create device sysfs files, err: %d\n",
ret);
goto free_srv_sess_dev;
}
}
ret = rnbd_srv_create_dev_session_sysfs(srv_sess_dev);
if (ret) {
mutex_unlock(&srv_dev->lock);
rnbd_srv_err(srv_sess_dev,
"Opening device failed, failed to create dev client sysfs files, err: %d\n",
ret);
goto free_srv_sess_dev;
}
list_add(&srv_sess_dev->dev_list, &srv_dev->sess_dev_list);
mutex_unlock(&srv_dev->lock);
list_add(&srv_sess_dev->sess_list, &srv_sess->sess_dev_list);
rnbd_srv_info(srv_sess_dev, "Opened device '%s'\n", srv_dev->id);
kfree(full_path);
fill_response:
rnbd_srv_fill_msg_open_rsp(rsp, srv_sess_dev);
mutex_unlock(&srv_sess->lock);
return 0;
free_srv_sess_dev:
xa_erase(&srv_sess->index_idr, srv_sess_dev->device_id);
synchronize_rcu();
kfree(srv_sess_dev);
srv_dev_put:
if (open_msg->access_mode != RNBD_ACCESS_RO) {
mutex_lock(&srv_dev->lock);
srv_dev->open_write_cnt--;
mutex_unlock(&srv_dev->lock);
}
rnbd_put_srv_dev(srv_dev);
rnbd_dev_close:
rnbd_dev_close(rnbd_dev);
free_path:
kfree(full_path);
reject:
mutex_unlock(&srv_sess->lock);
return ret;
}
static struct rtrs_srv_ctx *rtrs_ctx;
static struct rtrs_srv_ops rtrs_ops;
static int __init rnbd_srv_init_module(void)
{
int err;
BUILD_BUG_ON(sizeof(struct rnbd_msg_hdr) != 4);
BUILD_BUG_ON(sizeof(struct rnbd_msg_sess_info) != 36);
BUILD_BUG_ON(sizeof(struct rnbd_msg_sess_info_rsp) != 36);
BUILD_BUG_ON(sizeof(struct rnbd_msg_open) != 264);
BUILD_BUG_ON(sizeof(struct rnbd_msg_close) != 8);
BUILD_BUG_ON(sizeof(struct rnbd_msg_open_rsp) != 56);
rtrs_ops = (struct rtrs_srv_ops) {
.rdma_ev = rnbd_srv_rdma_ev,
.link_ev = rnbd_srv_link_ev,
};
rtrs_ctx = rtrs_srv_open(&rtrs_ops, port_nr);
if (IS_ERR(rtrs_ctx)) {
err = PTR_ERR(rtrs_ctx);
pr_err("rtrs_srv_open(), err: %d\n", err);
return err;
}
err = rnbd_srv_create_sysfs_files();
if (err) {
pr_err("rnbd_srv_create_sysfs_files(), err: %d\n", err);
rtrs_srv_close(rtrs_ctx);
return err;
}
return 0;
}
static void __exit rnbd_srv_cleanup_module(void)
{
rtrs_srv_close(rtrs_ctx);
WARN_ON(!list_empty(&sess_list));
rnbd_srv_destroy_sysfs_files();
}
module_init(rnbd_srv_init_module);
module_exit(rnbd_srv_cleanup_module);

View File

@ -0,0 +1,78 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* RDMA Network Block Driver
*
* Copyright (c) 2014 - 2018 ProfitBricks GmbH. All rights reserved.
* Copyright (c) 2018 - 2019 1&1 IONOS Cloud GmbH. All rights reserved.
* Copyright (c) 2019 - 2020 1&1 IONOS SE. All rights reserved.
*/
#ifndef RNBD_SRV_H
#define RNBD_SRV_H
#include <linux/types.h>
#include <linux/idr.h>
#include <linux/kref.h>
#include <rtrs.h>
#include "rnbd-proto.h"
#include "rnbd-log.h"
struct rnbd_srv_session {
/* Entry inside global sess_list */
struct list_head list;
struct rtrs_srv *rtrs;
char sessname[NAME_MAX];
int queue_depth;
struct bio_set sess_bio_set;
struct xarray index_idr;
/* List of struct rnbd_srv_sess_dev */
struct list_head sess_dev_list;
struct mutex lock;
u8 ver;
};
struct rnbd_srv_dev {
/* Entry inside global dev_list */
struct list_head list;
struct kobject dev_kobj;
struct kobject *dev_sessions_kobj;
struct kref kref;
char id[NAME_MAX];
/* List of rnbd_srv_sess_dev structs */
struct list_head sess_dev_list;
struct mutex lock;
int open_write_cnt;
};
/* Structure which binds N devices and N sessions */
struct rnbd_srv_sess_dev {
/* Entry inside rnbd_srv_dev struct */
struct list_head dev_list;
/* Entry inside rnbd_srv_session struct */
struct list_head sess_list;
struct rnbd_dev *rnbd_dev;
struct rnbd_srv_session *sess;
struct rnbd_srv_dev *dev;
struct kobject kobj;
u32 device_id;
fmode_t open_flags;
struct kref kref;
struct completion *destroy_comp;
char pathname[NAME_MAX];
enum rnbd_access_mode access_mode;
};
/* rnbd-srv-sysfs.c */
int rnbd_srv_create_dev_sysfs(struct rnbd_srv_dev *dev,
struct block_device *bdev,
const char *dir_name);
void rnbd_srv_destroy_dev_sysfs(struct rnbd_srv_dev *dev);
int rnbd_srv_create_dev_session_sysfs(struct rnbd_srv_sess_dev *sess_dev);
void rnbd_srv_destroy_dev_session_sysfs(struct rnbd_srv_sess_dev *sess_dev);
int rnbd_srv_create_sysfs_files(void);
void rnbd_srv_destroy_sysfs_files(void);
void rnbd_destroy_sess_dev(struct rnbd_srv_sess_dev *sess_dev);
#endif /* RNBD_SRV_H */

View File

@ -107,6 +107,7 @@ source "drivers/infiniband/ulp/srpt/Kconfig"
source "drivers/infiniband/ulp/iser/Kconfig"
source "drivers/infiniband/ulp/isert/Kconfig"
source "drivers/infiniband/ulp/rtrs/Kconfig"
source "drivers/infiniband/ulp/opa_vnic/Kconfig"

View File

@ -8,11 +8,11 @@ obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o
obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o $(user_access-y)
ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
device.o fmr_pool.o cache.o netlink.o \
device.o cache.o netlink.o \
roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
multicast.o mad.o smi.o agent.o mad_rmpp.o \
nldev.o restrack.o counters.o ib_core_uverbs.o \
trace.o
trace.o lag.o
ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
@ -36,6 +36,9 @@ ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
uverbs_std_types_flow_action.o uverbs_std_types_dm.o \
uverbs_std_types_mr.o uverbs_std_types_counters.o \
uverbs_uapi.o uverbs_std_types_device.o \
uverbs_std_types_async_fd.o
uverbs_std_types_async_fd.o \
uverbs_std_types_srq.o \
uverbs_std_types_wq.o \
uverbs_std_types_qp.o
ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o

View File

@ -371,6 +371,8 @@ static int fetch_ha(const struct dst_entry *dst, struct rdma_dev_addr *dev_addr,
(const void *)&dst_in6->sin6_addr;
sa_family_t family = dst_in->sa_family;
might_sleep();
/* If we have a gateway in IB mode then it must be an IB network */
if (has_gateway(dst, family) && dev_addr->network == RDMA_NETWORK_IB)
return ib_nl_fetch_ha(dev_addr, daddr, seq, family);
@ -727,6 +729,8 @@ int roce_resolve_route_from_path(struct sa_path_rec *rec,
struct rdma_dev_addr dev_addr = {};
int ret;
might_sleep();
if (rec->roce.route_resolved)
return 0;

View File

@ -66,6 +66,8 @@ static const char * const ibcm_rej_reason_strs[] = {
[IB_CM_REJ_INVALID_CLASS_VERSION] = "invalid class version",
[IB_CM_REJ_INVALID_FLOW_LABEL] = "invalid flow label",
[IB_CM_REJ_INVALID_ALT_FLOW_LABEL] = "invalid alt flow label",
[IB_CM_REJ_VENDOR_OPTION_NOT_SUPPORTED] =
"vendor option is not supported",
};
const char *__attribute_const__ ibcm_reject_msg(int reason)
@ -81,8 +83,11 @@ const char *__attribute_const__ ibcm_reject_msg(int reason)
EXPORT_SYMBOL(ibcm_reject_msg);
struct cm_id_private;
static void cm_add_one(struct ib_device *device);
struct cm_work;
static int cm_add_one(struct ib_device *device);
static void cm_remove_one(struct ib_device *device, void *client_data);
static void cm_process_work(struct cm_id_private *cm_id_priv,
struct cm_work *work);
static int cm_send_sidr_rep_locked(struct cm_id_private *cm_id_priv,
struct ib_cm_sidr_rep_param *param);
static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
@ -287,6 +292,8 @@ struct cm_id_private {
struct list_head work_list;
atomic_t work_count;
struct rdma_ucm_ece ece;
};
static void cm_work_handler(struct work_struct *work);
@ -474,24 +481,19 @@ static int cm_init_av_for_response(struct cm_port *port, struct ib_wc *wc,
grh, &av->ah_attr);
}
static int add_cm_id_to_port_list(struct cm_id_private *cm_id_priv,
struct cm_av *av,
struct cm_port *port)
static void add_cm_id_to_port_list(struct cm_id_private *cm_id_priv,
struct cm_av *av, struct cm_port *port)
{
unsigned long flags;
int ret = 0;
spin_lock_irqsave(&cm.lock, flags);
if (&cm_id_priv->av == av)
list_add_tail(&cm_id_priv->prim_list, &port->cm_priv_prim_list);
else if (&cm_id_priv->alt_av == av)
list_add_tail(&cm_id_priv->altr_list, &port->cm_priv_altr_list);
else
ret = -EINVAL;
WARN_ON(true);
spin_unlock_irqrestore(&cm.lock, flags);
return ret;
}
static struct cm_port *
@ -572,12 +574,7 @@ static int cm_init_av_by_path(struct sa_path_rec *path,
return ret;
av->timeout = path->packet_life_time + 1;
ret = add_cm_id_to_port_list(cm_id_priv, av, port);
if (ret) {
rdma_destroy_ah_attr(&new_ah_attr);
return ret;
}
add_cm_id_to_port_list(cm_id_priv, av, port);
rdma_move_ah_attr(&av->ah_attr, &new_ah_attr);
return 0;
}
@ -587,11 +584,6 @@ static u32 cm_local_id(__be32 local_id)
return (__force u32) (local_id ^ cm.random_id_operand);
}
static void cm_free_id(__be32 local_id)
{
xa_erase_irq(&cm.local_id_table, cm_local_id(local_id));
}
static struct cm_id_private *cm_acquire_id(__be32 local_id, __be32 remote_id)
{
struct cm_id_private *cm_id_priv;
@ -698,9 +690,10 @@ static struct cm_id_private * cm_find_listen(struct ib_device *device,
cm_id_priv = rb_entry(node, struct cm_id_private, service_node);
if ((cm_id_priv->id.service_mask & service_id) ==
cm_id_priv->id.service_id &&
(cm_id_priv->id.device == device))
(cm_id_priv->id.device == device)) {
refcount_inc(&cm_id_priv->refcount);
return cm_id_priv;
}
if (device < cm_id_priv->id.device)
node = node->rb_left;
else if (device > cm_id_priv->id.device)
@ -745,12 +738,14 @@ static struct cm_timewait_info * cm_insert_remote_id(struct cm_timewait_info
return NULL;
}
static struct cm_timewait_info * cm_find_remote_id(__be64 remote_ca_guid,
__be32 remote_id)
static struct cm_id_private *cm_find_remote_id(__be64 remote_ca_guid,
__be32 remote_id)
{
struct rb_node *node = cm.remote_id_table.rb_node;
struct cm_timewait_info *timewait_info;
struct cm_id_private *res = NULL;
spin_lock_irq(&cm.lock);
while (node) {
timewait_info = rb_entry(node, struct cm_timewait_info,
remote_id_node);
@ -762,10 +757,14 @@ static struct cm_timewait_info * cm_find_remote_id(__be64 remote_ca_guid,
node = node->rb_left;
else if (be64_gt(remote_ca_guid, timewait_info->remote_ca_guid))
node = node->rb_right;
else
return timewait_info;
else {
res = cm_acquire_id(timewait_info->work.local_id,
timewait_info->work.remote_id);
break;
}
}
return NULL;
spin_unlock_irq(&cm.lock);
return res;
}
static struct cm_timewait_info * cm_insert_remote_qpn(struct cm_timewait_info
@ -917,6 +916,35 @@ static void cm_free_work(struct cm_work *work)
kfree(work);
}
static void cm_queue_work_unlock(struct cm_id_private *cm_id_priv,
struct cm_work *work)
{
bool immediate;
/*
* To deliver the event to the user callback we have the drop the
* spinlock, however, we need to ensure that the user callback is single
* threaded and receives events in the temporal order. If there are
* already events being processed then thread new events onto a list,
* the thread currently processing will pick them up.
*/
immediate = atomic_inc_and_test(&cm_id_priv->work_count);
if (!immediate) {
list_add_tail(&work->list, &cm_id_priv->work_list);
/*
* This routine always consumes incoming reference. Once queued
* to the work_list then a reference is held by the thread
* currently running cm_process_work() and this reference is not
* needed.
*/
cm_deref_id(cm_id_priv);
}
spin_unlock_irq(&cm_id_priv->lock);
if (immediate)
cm_process_work(cm_id_priv, work);
}
static inline int cm_convert_to_ms(int iba_time)
{
/* approximate conversion to ms from 4.096us x 2^iba_time */
@ -942,8 +970,10 @@ static u8 cm_ack_timeout(u8 ca_ack_delay, u8 packet_life_time)
return min(31, ack_timeout);
}
static void cm_cleanup_timewait(struct cm_timewait_info *timewait_info)
static void cm_remove_remote(struct cm_id_private *cm_id_priv)
{
struct cm_timewait_info *timewait_info = cm_id_priv->timewait_info;
if (timewait_info->inserted_remote_id) {
rb_erase(&timewait_info->remote_id_node, &cm.remote_id_table);
timewait_info->inserted_remote_id = 0;
@ -982,7 +1012,7 @@ static void cm_enter_timewait(struct cm_id_private *cm_id_priv)
return;
spin_lock_irqsave(&cm.lock, flags);
cm_cleanup_timewait(cm_id_priv->timewait_info);
cm_remove_remote(cm_id_priv);
list_add_tail(&cm_id_priv->timewait_info->list, &cm.timewait_list);
spin_unlock_irqrestore(&cm.lock, flags);
@ -1001,6 +1031,11 @@ static void cm_enter_timewait(struct cm_id_private *cm_id_priv)
msecs_to_jiffies(wait_time));
spin_unlock_irqrestore(&cm.lock, flags);
/*
* The timewait_info is converted into a work and gets freed during
* cm_free_work() in cm_timewait_handler().
*/
BUILD_BUG_ON(offsetof(struct cm_timewait_info, work) != 0);
cm_id_priv->timewait_info = NULL;
}
@ -1013,7 +1048,7 @@ static void cm_reset_to_idle(struct cm_id_private *cm_id_priv)
cm_id_priv->id.state = IB_CM_IDLE;
if (cm_id_priv->timewait_info) {
spin_lock_irqsave(&cm.lock, flags);
cm_cleanup_timewait(cm_id_priv->timewait_info);
cm_remove_remote(cm_id_priv);
spin_unlock_irqrestore(&cm.lock, flags);
kfree(cm_id_priv->timewait_info);
cm_id_priv->timewait_info = NULL;
@ -1076,7 +1111,9 @@ retest:
case IB_CM_REP_SENT:
case IB_CM_MRA_REP_RCVD:
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
/* Fall through */
cm_send_rej_locked(cm_id_priv, IB_CM_REJ_CONSUMER_DEFINED, NULL,
0, NULL, 0);
goto retest;
case IB_CM_MRA_REQ_SENT:
case IB_CM_REP_RCVD:
case IB_CM_MRA_REP_SENT:
@ -1101,7 +1138,7 @@ retest:
case IB_CM_TIMEWAIT:
/*
* The cm_acquire_id in cm_timewait_handler will stop working
* once we do cm_free_id() below, so just move to idle here for
* once we do xa_erase below, so just move to idle here for
* consistency.
*/
cm_id->state = IB_CM_IDLE;
@ -1114,7 +1151,7 @@ retest:
spin_lock(&cm.lock);
/* Required for cleanup paths related cm_req_handler() */
if (cm_id_priv->timewait_info) {
cm_cleanup_timewait(cm_id_priv->timewait_info);
cm_remove_remote(cm_id_priv);
kfree(cm_id_priv->timewait_info);
cm_id_priv->timewait_info = NULL;
}
@ -1131,7 +1168,7 @@ retest:
spin_unlock(&cm.lock);
spin_unlock_irq(&cm_id_priv->lock);
cm_free_id(cm_id->local_id);
xa_erase_irq(&cm.local_id_table, cm_local_id(cm_id->local_id));
cm_deref_id(cm_id_priv);
wait_for_completion(&cm_id_priv->comp);
while ((work = cm_dequeue_work(cm_id_priv)) != NULL)
@ -1287,6 +1324,13 @@ static void cm_format_mad_hdr(struct ib_mad_hdr *hdr,
hdr->tid = tid;
}
static void cm_format_mad_ece_hdr(struct ib_mad_hdr *hdr, __be16 attr_id,
__be64 tid, u32 attr_mod)
{
cm_format_mad_hdr(hdr, attr_id, tid);
hdr->attr_mod = cpu_to_be32(attr_mod);
}
static void cm_format_req(struct cm_req_msg *req_msg,
struct cm_id_private *cm_id_priv,
struct ib_cm_req_param *param)
@ -1299,8 +1343,8 @@ static void cm_format_req(struct cm_req_msg *req_msg,
pri_ext = opa_is_extended_lid(pri_path->opa.dlid,
pri_path->opa.slid);
cm_format_mad_hdr(&req_msg->hdr, CM_REQ_ATTR_ID,
cm_form_tid(cm_id_priv));
cm_format_mad_ece_hdr(&req_msg->hdr, CM_REQ_ATTR_ID,
cm_form_tid(cm_id_priv), param->ece.attr_mod);
IBA_SET(CM_REQ_LOCAL_COMM_ID, req_msg,
be32_to_cpu(cm_id_priv->id.local_id));
@ -1423,6 +1467,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
cm_ack_timeout(cm_id_priv->av.port->cm_dev->ack_delay,
alt_path->packet_life_time));
}
IBA_SET(CM_REQ_VENDOR_ID, req_msg, param->ece.vendor_id);
if (param->private_data && param->private_data_len)
IBA_SET_MEM(CM_REQ_PRIVATE_DATA, req_msg, param->private_data,
@ -1779,6 +1824,9 @@ static void cm_format_req_event(struct cm_work *work,
param->rnr_retry_count = IBA_GET(CM_REQ_RNR_RETRY_COUNT, req_msg);
param->srq = IBA_GET(CM_REQ_SRQ, req_msg);
param->ppath_sgid_attr = cm_id_priv->av.ah_attr.grh.sgid_attr;
param->ece.vendor_id = IBA_GET(CM_REQ_VENDOR_ID, req_msg);
param->ece.attr_mod = be32_to_cpu(req_msg->hdr.attr_mod);
work->cm_event.private_data =
IBA_GET_MEM_PTR(CM_REQ_PRIVATE_DATA, req_msg);
}
@ -1927,7 +1975,6 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv;
struct cm_timewait_info *timewait_info;
struct cm_req_msg *req_msg;
struct ib_cm_id *cm_id;
req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
@ -1948,7 +1995,7 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
/* Check for stale connections. */
timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info);
if (timewait_info) {
cm_cleanup_timewait(cm_id_priv->timewait_info);
cm_remove_remote(cm_id_priv);
cur_cm_id_priv = cm_acquire_id(timewait_info->work.local_id,
timewait_info->work.remote_id);
@ -1957,8 +2004,7 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ,
NULL, 0);
if (cur_cm_id_priv) {
cm_id = &cur_cm_id_priv->id;
ib_send_cm_dreq(cm_id, NULL, 0);
ib_send_cm_dreq(&cur_cm_id_priv->id, NULL, 0);
cm_deref_id(cur_cm_id_priv);
}
return NULL;
@ -1969,14 +2015,13 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
cm_id_priv->id.device,
cpu_to_be64(IBA_GET(CM_REQ_SERVICE_ID, req_msg)));
if (!listen_cm_id_priv) {
cm_cleanup_timewait(cm_id_priv->timewait_info);
cm_remove_remote(cm_id_priv);
spin_unlock_irq(&cm.lock);
cm_issue_rej(work->port, work->mad_recv_wc,
IB_CM_REJ_INVALID_SERVICE_ID, CM_MSG_RESPONSE_REQ,
NULL, 0);
return NULL;
}
refcount_inc(&listen_cm_id_priv->refcount);
spin_unlock_irq(&cm.lock);
return listen_cm_id_priv;
}
@ -2153,9 +2198,7 @@ static int cm_req_handler(struct cm_work *work)
/* Refcount belongs to the event, pairs with cm_process_work() */
refcount_inc(&cm_id_priv->refcount);
atomic_inc(&cm_id_priv->work_count);
spin_unlock_irq(&cm_id_priv->lock);
cm_process_work(cm_id_priv, work);
cm_queue_work_unlock(cm_id_priv, work);
/*
* Since this ID was just created and was not made visible to other MAD
* handlers until the cm_finalize_id() above we know that the
@ -2176,7 +2219,8 @@ static void cm_format_rep(struct cm_rep_msg *rep_msg,
struct cm_id_private *cm_id_priv,
struct ib_cm_rep_param *param)
{
cm_format_mad_hdr(&rep_msg->hdr, CM_REP_ATTR_ID, cm_id_priv->tid);
cm_format_mad_ece_hdr(&rep_msg->hdr, CM_REP_ATTR_ID, cm_id_priv->tid,
param->ece.attr_mod);
IBA_SET(CM_REP_LOCAL_COMM_ID, rep_msg,
be32_to_cpu(cm_id_priv->id.local_id));
IBA_SET(CM_REP_REMOTE_COMM_ID, rep_msg,
@ -2203,6 +2247,10 @@ static void cm_format_rep(struct cm_rep_msg *rep_msg,
IBA_SET(CM_REP_LOCAL_EE_CONTEXT_NUMBER, rep_msg, param->qp_num);
}
IBA_SET(CM_REP_VENDOR_ID_L, rep_msg, param->ece.vendor_id);
IBA_SET(CM_REP_VENDOR_ID_M, rep_msg, param->ece.vendor_id >> 8);
IBA_SET(CM_REP_VENDOR_ID_H, rep_msg, param->ece.vendor_id >> 16);
if (param->private_data && param->private_data_len)
IBA_SET_MEM(CM_REP_PRIVATE_DATA, rep_msg, param->private_data,
param->private_data_len);
@ -2350,6 +2398,11 @@ static void cm_format_rep_event(struct cm_work *work, enum ib_qp_type qp_type)
param->flow_control = IBA_GET(CM_REP_END_TO_END_FLOW_CONTROL, rep_msg);
param->rnr_retry_count = IBA_GET(CM_REP_RNR_RETRY_COUNT, rep_msg);
param->srq = IBA_GET(CM_REP_SRQ, rep_msg);
param->ece.vendor_id = IBA_GET(CM_REP_VENDOR_ID_H, rep_msg) << 16;
param->ece.vendor_id |= IBA_GET(CM_REP_VENDOR_ID_M, rep_msg) << 8;
param->ece.vendor_id |= IBA_GET(CM_REP_VENDOR_ID_L, rep_msg);
param->ece.attr_mod = be32_to_cpu(rep_msg->hdr.attr_mod);
work->cm_event.private_data =
IBA_GET_MEM_PTR(CM_REP_PRIVATE_DATA, rep_msg);
}
@ -2404,7 +2457,6 @@ static int cm_rep_handler(struct cm_work *work)
struct cm_rep_msg *rep_msg;
int ret;
struct cm_id_private *cur_cm_id_priv;
struct ib_cm_id *cm_id;
struct cm_timewait_info *timewait_info;
rep_msg = (struct cm_rep_msg *)work->mad_recv_wc->recv_buf.mad;
@ -2454,9 +2506,7 @@ static int cm_rep_handler(struct cm_work *work)
/* Check for a stale connection. */
timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info);
if (timewait_info) {
rb_erase(&cm_id_priv->timewait_info->remote_id_node,
&cm.remote_id_table);
cm_id_priv->timewait_info->inserted_remote_id = 0;
cm_remove_remote(cm_id_priv);
cur_cm_id_priv = cm_acquire_id(timewait_info->work.local_id,
timewait_info->work.remote_id);
@ -2472,8 +2522,7 @@ static int cm_rep_handler(struct cm_work *work)
IBA_GET(CM_REP_REMOTE_COMM_ID, rep_msg));
if (cur_cm_id_priv) {
cm_id = &cur_cm_id_priv->id;
ib_send_cm_dreq(cm_id, NULL, 0);
ib_send_cm_dreq(&cur_cm_id_priv->id, NULL, 0);
cm_deref_id(cur_cm_id_priv);
}
@ -2501,15 +2550,7 @@ static int cm_rep_handler(struct cm_work *work)
cm_id_priv->alt_av.timeout - 1);
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
ret = atomic_inc_and_test(&cm_id_priv->work_count);
if (!ret)
list_add_tail(&work->list, &cm_id_priv->work_list);
spin_unlock_irq(&cm_id_priv->lock);
if (ret)
cm_process_work(cm_id_priv, work);
else
cm_deref_id(cm_id_priv);
cm_queue_work_unlock(cm_id_priv, work);
return 0;
error:
@ -2520,7 +2561,6 @@ error:
static int cm_establish_handler(struct cm_work *work)
{
struct cm_id_private *cm_id_priv;
int ret;
/* See comment in cm_establish about lookup. */
cm_id_priv = cm_acquire_id(work->local_id, work->remote_id);
@ -2534,15 +2574,7 @@ static int cm_establish_handler(struct cm_work *work)
}
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
ret = atomic_inc_and_test(&cm_id_priv->work_count);
if (!ret)
list_add_tail(&work->list, &cm_id_priv->work_list);
spin_unlock_irq(&cm_id_priv->lock);
if (ret)
cm_process_work(cm_id_priv, work);
else
cm_deref_id(cm_id_priv);
cm_queue_work_unlock(cm_id_priv, work);
return 0;
out:
cm_deref_id(cm_id_priv);
@ -2553,7 +2585,6 @@ static int cm_rtu_handler(struct cm_work *work)
{
struct cm_id_private *cm_id_priv;
struct cm_rtu_msg *rtu_msg;
int ret;
rtu_msg = (struct cm_rtu_msg *)work->mad_recv_wc->recv_buf.mad;
cm_id_priv = cm_acquire_id(
@ -2576,15 +2607,7 @@ static int cm_rtu_handler(struct cm_work *work)
cm_id_priv->id.state = IB_CM_ESTABLISHED;
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
ret = atomic_inc_and_test(&cm_id_priv->work_count);
if (!ret)
list_add_tail(&work->list, &cm_id_priv->work_list);
spin_unlock_irq(&cm_id_priv->lock);
if (ret)
cm_process_work(cm_id_priv, work);
else
cm_deref_id(cm_id_priv);
cm_queue_work_unlock(cm_id_priv, work);
return 0;
out:
cm_deref_id(cm_id_priv);
@ -2777,7 +2800,6 @@ static int cm_dreq_handler(struct cm_work *work)
struct cm_id_private *cm_id_priv;
struct cm_dreq_msg *dreq_msg;
struct ib_mad_send_buf *msg = NULL;
int ret;
dreq_msg = (struct cm_dreq_msg *)work->mad_recv_wc->recv_buf.mad;
cm_id_priv = cm_acquire_id(
@ -2842,15 +2864,7 @@ static int cm_dreq_handler(struct cm_work *work)
}
cm_id_priv->id.state = IB_CM_DREQ_RCVD;
cm_id_priv->tid = dreq_msg->hdr.tid;
ret = atomic_inc_and_test(&cm_id_priv->work_count);
if (!ret)
list_add_tail(&work->list, &cm_id_priv->work_list);
spin_unlock_irq(&cm_id_priv->lock);
if (ret)
cm_process_work(cm_id_priv, work);
else
cm_deref_id(cm_id_priv);
cm_queue_work_unlock(cm_id_priv, work);
return 0;
unlock: spin_unlock_irq(&cm_id_priv->lock);
@ -2862,7 +2876,6 @@ static int cm_drep_handler(struct cm_work *work)
{
struct cm_id_private *cm_id_priv;
struct cm_drep_msg *drep_msg;
int ret;
drep_msg = (struct cm_drep_msg *)work->mad_recv_wc->recv_buf.mad;
cm_id_priv = cm_acquire_id(
@ -2883,15 +2896,7 @@ static int cm_drep_handler(struct cm_work *work)
cm_enter_timewait(cm_id_priv);
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
ret = atomic_inc_and_test(&cm_id_priv->work_count);
if (!ret)
list_add_tail(&work->list, &cm_id_priv->work_list);
spin_unlock_irq(&cm_id_priv->lock);
if (ret)
cm_process_work(cm_id_priv, work);
else
cm_deref_id(cm_id_priv);
cm_queue_work_unlock(cm_id_priv, work);
return 0;
out:
cm_deref_id(cm_id_priv);
@ -2987,24 +2992,15 @@ static void cm_format_rej_event(struct cm_work *work)
static struct cm_id_private * cm_acquire_rejected_id(struct cm_rej_msg *rej_msg)
{
struct cm_timewait_info *timewait_info;
struct cm_id_private *cm_id_priv;
__be32 remote_id;
remote_id = cpu_to_be32(IBA_GET(CM_REJ_LOCAL_COMM_ID, rej_msg));
if (IBA_GET(CM_REJ_REASON, rej_msg) == IB_CM_REJ_TIMEOUT) {
spin_lock_irq(&cm.lock);
timewait_info = cm_find_remote_id(
cm_id_priv = cm_find_remote_id(
*((__be64 *)IBA_GET_MEM_PTR(CM_REJ_ARI, rej_msg)),
remote_id);
if (!timewait_info) {
spin_unlock_irq(&cm.lock);
return NULL;
}
cm_id_priv =
cm_acquire_id(timewait_info->work.local_id, remote_id);
spin_unlock_irq(&cm.lock);
} else if (IBA_GET(CM_REJ_MESSAGE_REJECTED, rej_msg) ==
CM_MSG_RESPONSE_REQ)
cm_id_priv = cm_acquire_id(
@ -3022,7 +3018,6 @@ static int cm_rej_handler(struct cm_work *work)
{
struct cm_id_private *cm_id_priv;
struct cm_rej_msg *rej_msg;
int ret;
rej_msg = (struct cm_rej_msg *)work->mad_recv_wc->recv_buf.mad;
cm_id_priv = cm_acquire_rejected_id(rej_msg);
@ -3068,19 +3063,10 @@ static int cm_rej_handler(struct cm_work *work)
__func__, be32_to_cpu(cm_id_priv->id.local_id),
cm_id_priv->id.state);
spin_unlock_irq(&cm_id_priv->lock);
ret = -EINVAL;
goto out;
}
ret = atomic_inc_and_test(&cm_id_priv->work_count);
if (!ret)
list_add_tail(&work->list, &cm_id_priv->work_list);
spin_unlock_irq(&cm_id_priv->lock);
if (ret)
cm_process_work(cm_id_priv, work);
else
cm_deref_id(cm_id_priv);
cm_queue_work_unlock(cm_id_priv, work);
return 0;
out:
cm_deref_id(cm_id_priv);
@ -3190,7 +3176,7 @@ static int cm_mra_handler(struct cm_work *work)
{
struct cm_id_private *cm_id_priv;
struct cm_mra_msg *mra_msg;
int timeout, ret;
int timeout;
mra_msg = (struct cm_mra_msg *)work->mad_recv_wc->recv_buf.mad;
cm_id_priv = cm_acquire_mraed_id(mra_msg);
@ -3250,15 +3236,7 @@ static int cm_mra_handler(struct cm_work *work)
cm_id_priv->msg->context[1] = (void *) (unsigned long)
cm_id_priv->id.state;
ret = atomic_inc_and_test(&cm_id_priv->work_count);
if (!ret)
list_add_tail(&work->list, &cm_id_priv->work_list);
spin_unlock_irq(&cm_id_priv->lock);
if (ret)
cm_process_work(cm_id_priv, work);
else
cm_deref_id(cm_id_priv);
cm_queue_work_unlock(cm_id_priv, work);
return 0;
out:
spin_unlock_irq(&cm_id_priv->lock);
@ -3393,15 +3371,7 @@ static int cm_lap_handler(struct cm_work *work)
cm_id_priv->id.lap_state = IB_CM_LAP_RCVD;
cm_id_priv->tid = lap_msg->hdr.tid;
ret = atomic_inc_and_test(&cm_id_priv->work_count);
if (!ret)
list_add_tail(&work->list, &cm_id_priv->work_list);
spin_unlock_irq(&cm_id_priv->lock);
if (ret)
cm_process_work(cm_id_priv, work);
else
cm_deref_id(cm_id_priv);
cm_queue_work_unlock(cm_id_priv, work);
return 0;
unlock: spin_unlock_irq(&cm_id_priv->lock);
@ -3413,7 +3383,6 @@ static int cm_apr_handler(struct cm_work *work)
{
struct cm_id_private *cm_id_priv;
struct cm_apr_msg *apr_msg;
int ret;
/* Currently Alternate path messages are not supported for
* RoCE link layer.
@ -3448,16 +3417,7 @@ static int cm_apr_handler(struct cm_work *work)
cm_id_priv->id.lap_state = IB_CM_LAP_IDLE;
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
cm_id_priv->msg = NULL;
ret = atomic_inc_and_test(&cm_id_priv->work_count);
if (!ret)
list_add_tail(&work->list, &cm_id_priv->work_list);
spin_unlock_irq(&cm_id_priv->lock);
if (ret)
cm_process_work(cm_id_priv, work);
else
cm_deref_id(cm_id_priv);
cm_queue_work_unlock(cm_id_priv, work);
return 0;
out:
cm_deref_id(cm_id_priv);
@ -3468,7 +3428,6 @@ static int cm_timewait_handler(struct cm_work *work)
{
struct cm_timewait_info *timewait_info;
struct cm_id_private *cm_id_priv;
int ret;
timewait_info = container_of(work, struct cm_timewait_info, work);
spin_lock_irq(&cm.lock);
@ -3487,15 +3446,7 @@ static int cm_timewait_handler(struct cm_work *work)
goto out;
}
cm_id_priv->id.state = IB_CM_IDLE;
ret = atomic_inc_and_test(&cm_id_priv->work_count);
if (!ret)
list_add_tail(&work->list, &cm_id_priv->work_list);
spin_unlock_irq(&cm_id_priv->lock);
if (ret)
cm_process_work(cm_id_priv, work);
else
cm_deref_id(cm_id_priv);
cm_queue_work_unlock(cm_id_priv, work);
return 0;
out:
cm_deref_id(cm_id_priv);
@ -3642,7 +3593,6 @@ static int cm_sidr_req_handler(struct cm_work *work)
.status = IB_SIDR_UNSUPPORTED });
goto out; /* No match. */
}
refcount_inc(&listen_cm_id_priv->refcount);
spin_unlock_irq(&cm.lock);
cm_id_priv->id.cm_handler = listen_cm_id_priv->id.cm_handler;
@ -3674,8 +3624,8 @@ static void cm_format_sidr_rep(struct cm_sidr_rep_msg *sidr_rep_msg,
struct cm_id_private *cm_id_priv,
struct ib_cm_sidr_rep_param *param)
{
cm_format_mad_hdr(&sidr_rep_msg->hdr, CM_SIDR_REP_ATTR_ID,
cm_id_priv->tid);
cm_format_mad_ece_hdr(&sidr_rep_msg->hdr, CM_SIDR_REP_ATTR_ID,
cm_id_priv->tid, param->ece.attr_mod);
IBA_SET(CM_SIDR_REP_REQUESTID, sidr_rep_msg,
be32_to_cpu(cm_id_priv->id.remote_id));
IBA_SET(CM_SIDR_REP_STATUS, sidr_rep_msg, param->status);
@ -3683,6 +3633,10 @@ static void cm_format_sidr_rep(struct cm_sidr_rep_msg *sidr_rep_msg,
IBA_SET(CM_SIDR_REP_SERVICEID, sidr_rep_msg,
be64_to_cpu(cm_id_priv->id.service_id));
IBA_SET(CM_SIDR_REP_Q_KEY, sidr_rep_msg, param->qkey);
IBA_SET(CM_SIDR_REP_VENDOR_ID_L, sidr_rep_msg,
param->ece.vendor_id & 0xFF);
IBA_SET(CM_SIDR_REP_VENDOR_ID_H, sidr_rep_msg,
(param->ece.vendor_id >> 8) & 0xFF);
if (param->info && param->info_length)
IBA_SET_MEM(CM_SIDR_REP_ADDITIONAL_INFORMATION, sidr_rep_msg,
@ -4384,7 +4338,7 @@ static void cm_remove_port_fs(struct cm_port *port)
}
static void cm_add_one(struct ib_device *ib_device)
static int cm_add_one(struct ib_device *ib_device)
{
struct cm_device *cm_dev;
struct cm_port *port;
@ -4403,7 +4357,7 @@ static void cm_add_one(struct ib_device *ib_device)
cm_dev = kzalloc(struct_size(cm_dev, port, ib_device->phys_port_cnt),
GFP_KERNEL);
if (!cm_dev)
return;
return -ENOMEM;
cm_dev->ib_device = ib_device;
cm_dev->ack_delay = ib_device->attrs.local_ca_ack_delay;
@ -4415,8 +4369,10 @@ static void cm_add_one(struct ib_device *ib_device)
continue;
port = kzalloc(sizeof *port, GFP_KERNEL);
if (!port)
if (!port) {
ret = -ENOMEM;
goto error1;
}
cm_dev->port[i-1] = port;
port->cm_dev = cm_dev;
@ -4437,8 +4393,10 @@ static void cm_add_one(struct ib_device *ib_device)
cm_recv_handler,
port,
0);
if (IS_ERR(port->mad_agent))
if (IS_ERR(port->mad_agent)) {
ret = PTR_ERR(port->mad_agent);
goto error2;
}
ret = ib_modify_port(ib_device, i, 0, &port_modify);
if (ret)
@ -4447,15 +4405,17 @@ static void cm_add_one(struct ib_device *ib_device)
count++;
}
if (!count)
if (!count) {
ret = -EOPNOTSUPP;
goto free;
}
ib_set_client_data(ib_device, &cm_client, cm_dev);
write_lock_irqsave(&cm.device_lock, flags);
list_add_tail(&cm_dev->list, &cm.device_list);
write_unlock_irqrestore(&cm.device_lock, flags);
return;
return 0;
error3:
ib_unregister_mad_agent(port->mad_agent);
@ -4477,6 +4437,7 @@ error1:
}
free:
kfree(cm_dev);
return ret;
}
static void cm_remove_one(struct ib_device *ib_device, void *client_data)
@ -4491,9 +4452,6 @@ static void cm_remove_one(struct ib_device *ib_device, void *client_data)
unsigned long flags;
int i;
if (!cm_dev)
return;
write_lock_irqsave(&cm.device_lock, flags);
list_del(&cm_dev->list);
write_unlock_irqrestore(&cm.device_lock, flags);

View File

@ -91,7 +91,13 @@ const char *__attribute_const__ rdma_reject_msg(struct rdma_cm_id *id,
}
EXPORT_SYMBOL(rdma_reject_msg);
bool rdma_is_consumer_reject(struct rdma_cm_id *id, int reason)
/**
* rdma_is_consumer_reject - return true if the consumer rejected the connect
* request.
* @id: Communication identifier that received the REJECT event.
* @reason: Value returned in the REJECT event status field.
*/
static bool rdma_is_consumer_reject(struct rdma_cm_id *id, int reason)
{
if (rdma_ib_or_roce(id->device, id->port_num))
return reason == IB_CM_REJ_CONSUMER_DEFINED;
@ -102,7 +108,6 @@ bool rdma_is_consumer_reject(struct rdma_cm_id *id, int reason)
WARN_ON_ONCE(1);
return false;
}
EXPORT_SYMBOL(rdma_is_consumer_reject);
const void *rdma_consumer_reject_data(struct rdma_cm_id *id,
struct rdma_cm_event *ev, u8 *data_len)
@ -148,7 +153,7 @@ struct rdma_cm_id *rdma_res_to_id(struct rdma_restrack_entry *res)
}
EXPORT_SYMBOL(rdma_res_to_id);
static void cma_add_one(struct ib_device *device);
static int cma_add_one(struct ib_device *device);
static void cma_remove_one(struct ib_device *device, void *client_data);
static struct ib_client cma_client = {
@ -479,6 +484,7 @@ static void _cma_attach_to_dev(struct rdma_id_private *id_priv,
rdma_restrack_kadd(&id_priv->res);
else
rdma_restrack_uadd(&id_priv->res);
trace_cm_id_attach(id_priv, cma_dev->device);
}
static void cma_attach_to_dev(struct rdma_id_private *id_priv,
@ -883,7 +889,6 @@ struct rdma_cm_id *__rdma_create_id(struct net *net,
id_priv->id.route.addr.dev_addr.net = get_net(net);
id_priv->seq_num &= 0x00ffffff;
trace_cm_id_create(id_priv);
return &id_priv->id;
}
EXPORT_SYMBOL(__rdma_create_id);
@ -1906,6 +1911,9 @@ static void cma_set_rep_event_data(struct rdma_cm_event *event,
event->param.conn.rnr_retry_count = rep_data->rnr_retry_count;
event->param.conn.srq = rep_data->srq;
event->param.conn.qp_num = rep_data->remote_qpn;
event->ece.vendor_id = rep_data->ece.vendor_id;
event->ece.attr_mod = rep_data->ece.attr_mod;
}
static int cma_cm_event_handler(struct rdma_id_private *id_priv,
@ -2124,6 +2132,9 @@ static void cma_set_req_event_data(struct rdma_cm_event *event,
event->param.conn.rnr_retry_count = req_data->rnr_retry_count;
event->param.conn.srq = req_data->srq;
event->param.conn.qp_num = req_data->remote_qpn;
event->ece.vendor_id = req_data->ece.vendor_id;
event->ece.attr_mod = req_data->ece.attr_mod;
}
static int cma_ib_check_req_qp_type(const struct rdma_cm_id *id,
@ -2904,6 +2915,24 @@ static int iboe_tos_to_sl(struct net_device *ndev, int tos)
return 0;
}
static __be32 cma_get_roce_udp_flow_label(struct rdma_id_private *id_priv)
{
struct sockaddr_in6 *addr6;
u16 dport, sport;
u32 hash, fl;
addr6 = (struct sockaddr_in6 *)cma_src_addr(id_priv);
fl = be32_to_cpu(addr6->sin6_flowinfo) & IB_GRH_FLOWLABEL_MASK;
if ((cma_family(id_priv) != AF_INET6) || !fl) {
dport = be16_to_cpu(cma_port(cma_dst_addr(id_priv)));
sport = be16_to_cpu(cma_port(cma_src_addr(id_priv)));
hash = (u32)sport * 31 + dport;
fl = hash & IB_GRH_FLOWLABEL_MASK;
}
return cpu_to_be32(fl);
}
static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
{
struct rdma_route *route = &id_priv->id.route;
@ -2970,6 +2999,11 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
goto err2;
}
if (rdma_protocol_roce_udp_encap(id_priv->id.device,
id_priv->id.port_num))
route->path_rec->flow_label =
cma_get_roce_udp_flow_label(id_priv);
cma_init_resolve_route_work(work, id_priv);
queue_work(cma_wq, &work->work);
@ -3919,6 +3953,8 @@ static int cma_connect_ib(struct rdma_id_private *id_priv,
req.local_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT;
req.max_cm_retries = CMA_MAX_CM_RETRIES;
req.srq = id_priv->srq ? 1 : 0;
req.ece.vendor_id = id_priv->ece.vendor_id;
req.ece.attr_mod = id_priv->ece.attr_mod;
trace_cm_send_req(id_priv);
ret = ib_send_cm_req(id_priv->cm_id.ib, &req);
@ -4008,6 +4044,27 @@ err:
}
EXPORT_SYMBOL(rdma_connect);
/**
* rdma_connect_ece - Initiate an active connection request with ECE data.
* @id: Connection identifier to connect.
* @conn_param: Connection information used for connected QPs.
* @ece: ECE parameters
*
* See rdma_connect() explanation.
*/
int rdma_connect_ece(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
struct rdma_ucm_ece *ece)
{
struct rdma_id_private *id_priv =
container_of(id, struct rdma_id_private, id);
id_priv->ece.vendor_id = ece->vendor_id;
id_priv->ece.attr_mod = ece->attr_mod;
return rdma_connect(id, conn_param);
}
EXPORT_SYMBOL(rdma_connect_ece);
static int cma_accept_ib(struct rdma_id_private *id_priv,
struct rdma_conn_param *conn_param)
{
@ -4033,6 +4090,8 @@ static int cma_accept_ib(struct rdma_id_private *id_priv,
rep.flow_control = conn_param->flow_control;
rep.rnr_retry_count = min_t(u8, 7, conn_param->rnr_retry_count);
rep.srq = id_priv->srq ? 1 : 0;
rep.ece.vendor_id = id_priv->ece.vendor_id;
rep.ece.attr_mod = id_priv->ece.attr_mod;
trace_cm_send_rep(id_priv);
ret = ib_send_cm_rep(id_priv->cm_id.ib, &rep);
@ -4080,7 +4139,11 @@ static int cma_send_sidr_rep(struct rdma_id_private *id_priv,
return ret;
rep.qp_num = id_priv->qp_num;
rep.qkey = id_priv->qkey;
rep.ece.vendor_id = id_priv->ece.vendor_id;
rep.ece.attr_mod = id_priv->ece.attr_mod;
}
rep.private_data = private_data;
rep.private_data_len = private_data_len;
@ -4133,11 +4196,24 @@ int __rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
return 0;
reject:
cma_modify_qp_err(id_priv);
rdma_reject(id, NULL, 0);
rdma_reject(id, NULL, 0, IB_CM_REJ_CONSUMER_DEFINED);
return ret;
}
EXPORT_SYMBOL(__rdma_accept);
int __rdma_accept_ece(struct rdma_cm_id *id, struct rdma_conn_param *conn_param,
const char *caller, struct rdma_ucm_ece *ece)
{
struct rdma_id_private *id_priv =
container_of(id, struct rdma_id_private, id);
id_priv->ece.vendor_id = ece->vendor_id;
id_priv->ece.attr_mod = ece->attr_mod;
return __rdma_accept(id, conn_param, caller);
}
EXPORT_SYMBOL(__rdma_accept_ece);
int rdma_notify(struct rdma_cm_id *id, enum ib_event_type event)
{
struct rdma_id_private *id_priv;
@ -4160,7 +4236,7 @@ int rdma_notify(struct rdma_cm_id *id, enum ib_event_type event)
EXPORT_SYMBOL(rdma_notify);
int rdma_reject(struct rdma_cm_id *id, const void *private_data,
u8 private_data_len)
u8 private_data_len, u8 reason)
{
struct rdma_id_private *id_priv;
int ret;
@ -4175,9 +4251,8 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data,
private_data, private_data_len);
} else {
trace_cm_send_rej(id_priv);
ret = ib_send_cm_rej(id_priv->cm_id.ib,
IB_CM_REJ_CONSUMER_DEFINED, NULL,
0, private_data, private_data_len);
ret = ib_send_cm_rej(id_priv->cm_id.ib, reason, NULL, 0,
private_data, private_data_len);
}
} else if (rdma_cap_iw_cm(id->device, id->port_num)) {
ret = iw_cm_reject(id_priv->cm_id.iw,
@ -4633,29 +4708,34 @@ static struct notifier_block cma_nb = {
.notifier_call = cma_netdev_callback
};
static void cma_add_one(struct ib_device *device)
static int cma_add_one(struct ib_device *device)
{
struct cma_device *cma_dev;
struct rdma_id_private *id_priv;
unsigned int i;
unsigned long supported_gids = 0;
int ret;
cma_dev = kmalloc(sizeof *cma_dev, GFP_KERNEL);
if (!cma_dev)
return;
return -ENOMEM;
cma_dev->device = device;
cma_dev->default_gid_type = kcalloc(device->phys_port_cnt,
sizeof(*cma_dev->default_gid_type),
GFP_KERNEL);
if (!cma_dev->default_gid_type)
if (!cma_dev->default_gid_type) {
ret = -ENOMEM;
goto free_cma_dev;
}
cma_dev->default_roce_tos = kcalloc(device->phys_port_cnt,
sizeof(*cma_dev->default_roce_tos),
GFP_KERNEL);
if (!cma_dev->default_roce_tos)
if (!cma_dev->default_roce_tos) {
ret = -ENOMEM;
goto free_gid_type;
}
rdma_for_each_port (device, i) {
supported_gids = roce_gid_type_mask_support(device, i);
@ -4681,15 +4761,14 @@ static void cma_add_one(struct ib_device *device)
mutex_unlock(&lock);
trace_cm_add_one(device);
return;
return 0;
free_gid_type:
kfree(cma_dev->default_gid_type);
free_cma_dev:
kfree(cma_dev);
return;
return ret;
}
static int cma_remove_id_dev(struct rdma_id_private *id_priv)
@ -4751,9 +4830,6 @@ static void cma_remove_one(struct ib_device *device, void *client_data)
trace_cm_remove_one(device);
if (!cma_dev)
return;
mutex_lock(&lock);
list_del(&cma_dev->list);
mutex_unlock(&lock);

View File

@ -322,8 +322,21 @@ fail:
return ERR_PTR(err);
}
static void drop_cma_dev(struct config_group *cgroup, struct config_item *item)
{
struct config_group *group =
container_of(item, struct config_group, cg_item);
struct cma_dev_group *cma_dev_group =
container_of(group, struct cma_dev_group, device_group);
configfs_remove_default_groups(&cma_dev_group->ports_group);
configfs_remove_default_groups(&cma_dev_group->device_group);
config_item_put(item);
}
static struct configfs_group_operations cma_subsys_group_ops = {
.make_group = make_cma_dev,
.drop_item = drop_cma_dev,
};
static const struct config_item_type cma_subsys_type = {

View File

@ -95,6 +95,7 @@ struct rdma_id_private {
* Internal to RDMA/core, don't use in the drivers
*/
struct rdma_restrack_entry res;
struct rdma_ucm_ece ece;
};
#if IS_ENABLED(CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS)

View File

@ -103,23 +103,33 @@ DEFINE_CMA_FSM_EVENT(sent_drep);
DEFINE_CMA_FSM_EVENT(sent_dreq);
DEFINE_CMA_FSM_EVENT(id_destroy);
TRACE_EVENT(cm_id_create,
TRACE_EVENT(cm_id_attach,
TP_PROTO(
const struct rdma_id_private *id_priv
const struct rdma_id_private *id_priv,
const struct ib_device *device
),
TP_ARGS(id_priv),
TP_ARGS(id_priv, device),
TP_STRUCT__entry(
__field(u32, cm_id)
__array(unsigned char, srcaddr, sizeof(struct sockaddr_in6))
__array(unsigned char, dstaddr, sizeof(struct sockaddr_in6))
__string(devname, device->name)
),
TP_fast_assign(
__entry->cm_id = id_priv->res.id;
memcpy(__entry->srcaddr, &id_priv->id.route.addr.src_addr,
sizeof(struct sockaddr_in6));
memcpy(__entry->dstaddr, &id_priv->id.route.addr.dst_addr,
sizeof(struct sockaddr_in6));
__assign_str(devname, device->name);
),
TP_printk("cm.id=%u",
__entry->cm_id
TP_printk("cm.id=%u src=%pISpc dst=%pISpc device=%s",
__entry->cm_id, __entry->srcaddr, __entry->dstaddr,
__get_str(devname)
)
);

View File

@ -414,4 +414,7 @@ void rdma_umap_priv_init(struct rdma_umap_priv *priv,
struct vm_area_struct *vma,
struct rdma_user_mmap_entry *entry);
void ib_cq_pool_init(struct ib_device *dev);
void ib_cq_pool_destroy(struct ib_device *dev);
#endif /* _CORE_PRIV_H */

View File

@ -7,7 +7,11 @@
#include <linux/slab.h>
#include <rdma/ib_verbs.h>
#include "core_priv.h"
#include <trace/events/rdma_core.h>
/* Max size for shared CQ, may require tuning */
#define IB_MAX_SHARED_CQ_SZ 4096U
/* # of WCs to poll for with a single call to ib_poll_cq */
#define IB_POLL_BATCH 16
@ -218,6 +222,7 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
cq->cq_context = private;
cq->poll_ctx = poll_ctx;
atomic_set(&cq->usecnt, 0);
cq->comp_vector = comp_vector;
cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL);
if (!cq->wc)
@ -309,6 +314,8 @@ void ib_free_cq_user(struct ib_cq *cq, struct ib_udata *udata)
{
if (WARN_ON_ONCE(atomic_read(&cq->usecnt)))
return;
if (WARN_ON_ONCE(cq->cqe_used))
return;
switch (cq->poll_ctx) {
case IB_POLL_DIRECT:
@ -334,3 +341,169 @@ void ib_free_cq_user(struct ib_cq *cq, struct ib_udata *udata)
kfree(cq);
}
EXPORT_SYMBOL(ib_free_cq_user);
void ib_cq_pool_init(struct ib_device *dev)
{
unsigned int i;
spin_lock_init(&dev->cq_pools_lock);
for (i = 0; i < ARRAY_SIZE(dev->cq_pools); i++)
INIT_LIST_HEAD(&dev->cq_pools[i]);
}
void ib_cq_pool_destroy(struct ib_device *dev)
{
struct ib_cq *cq, *n;
unsigned int i;
for (i = 0; i < ARRAY_SIZE(dev->cq_pools); i++) {
list_for_each_entry_safe(cq, n, &dev->cq_pools[i],
pool_entry) {
WARN_ON(cq->cqe_used);
cq->shared = false;
ib_free_cq(cq);
}
}
}
static int ib_alloc_cqs(struct ib_device *dev, unsigned int nr_cqes,
enum ib_poll_context poll_ctx)
{
LIST_HEAD(tmp_list);
unsigned int nr_cqs, i;
struct ib_cq *cq;
int ret;
if (poll_ctx > IB_POLL_LAST_POOL_TYPE) {
WARN_ON_ONCE(poll_ctx > IB_POLL_LAST_POOL_TYPE);
return -EINVAL;
}
/*
* Allocate at least as many CQEs as requested, and otherwise
* a reasonable batch size so that we can share CQs between
* multiple users instead of allocating a larger number of CQs.
*/
nr_cqes = min_t(unsigned int, dev->attrs.max_cqe,
max(nr_cqes, IB_MAX_SHARED_CQ_SZ));
nr_cqs = min_t(unsigned int, dev->num_comp_vectors, num_online_cpus());
for (i = 0; i < nr_cqs; i++) {
cq = ib_alloc_cq(dev, NULL, nr_cqes, i, poll_ctx);
if (IS_ERR(cq)) {
ret = PTR_ERR(cq);
goto out_free_cqs;
}
cq->shared = true;
list_add_tail(&cq->pool_entry, &tmp_list);
}
spin_lock_irq(&dev->cq_pools_lock);
list_splice(&tmp_list, &dev->cq_pools[poll_ctx]);
spin_unlock_irq(&dev->cq_pools_lock);
return 0;
out_free_cqs:
list_for_each_entry(cq, &tmp_list, pool_entry) {
cq->shared = false;
ib_free_cq(cq);
}
return ret;
}
/**
* ib_cq_pool_get() - Find the least used completion queue that matches
* a given cpu hint (or least used for wild card affinity) and fits
* nr_cqe.
* @dev: rdma device
* @nr_cqe: number of needed cqe entries
* @comp_vector_hint: completion vector hint (-1) for the driver to assign
* a comp vector based on internal counter
* @poll_ctx: cq polling context
*
* Finds a cq that satisfies @comp_vector_hint and @nr_cqe requirements and
* claim entries in it for us. In case there is no available cq, allocate
* a new cq with the requirements and add it to the device pool.
* IB_POLL_DIRECT cannot be used for shared cqs so it is not a valid value
* for @poll_ctx.
*/
struct ib_cq *ib_cq_pool_get(struct ib_device *dev, unsigned int nr_cqe,
int comp_vector_hint,
enum ib_poll_context poll_ctx)
{
static unsigned int default_comp_vector;
unsigned int vector, num_comp_vectors;
struct ib_cq *cq, *found = NULL;
int ret;
if (poll_ctx > IB_POLL_LAST_POOL_TYPE) {
WARN_ON_ONCE(poll_ctx > IB_POLL_LAST_POOL_TYPE);
return ERR_PTR(-EINVAL);
}
num_comp_vectors =
min_t(unsigned int, dev->num_comp_vectors, num_online_cpus());
/* Project the affinty to the device completion vector range */
if (comp_vector_hint < 0) {
comp_vector_hint =
(READ_ONCE(default_comp_vector) + 1) % num_comp_vectors;
WRITE_ONCE(default_comp_vector, comp_vector_hint);
}
vector = comp_vector_hint % num_comp_vectors;
/*
* Find the least used CQ with correct affinity and
* enough free CQ entries
*/
while (!found) {
spin_lock_irq(&dev->cq_pools_lock);
list_for_each_entry(cq, &dev->cq_pools[poll_ctx],
pool_entry) {
/*
* Check to see if we have found a CQ with the
* correct completion vector
*/
if (vector != cq->comp_vector)
continue;
if (cq->cqe_used + nr_cqe > cq->cqe)
continue;
found = cq;
break;
}
if (found) {
found->cqe_used += nr_cqe;
spin_unlock_irq(&dev->cq_pools_lock);
return found;
}
spin_unlock_irq(&dev->cq_pools_lock);
/*
* Didn't find a match or ran out of CQs in the device
* pool, allocate a new array of CQs.
*/
ret = ib_alloc_cqs(dev, nr_cqe, poll_ctx);
if (ret)
return ERR_PTR(ret);
}
return found;
}
EXPORT_SYMBOL(ib_cq_pool_get);
/**
* ib_cq_pool_put - Return a CQ taken from a shared pool.
* @cq: The CQ to return.
* @nr_cqe: The max number of cqes that the user had requested.
*/
void ib_cq_pool_put(struct ib_cq *cq, unsigned int nr_cqe)
{
if (WARN_ON_ONCE(nr_cqe > cq->cqe_used))
return;
spin_lock_irq(&cq->device->cq_pools_lock);
cq->cqe_used -= nr_cqe;
spin_unlock_irq(&cq->device->cq_pools_lock);
}
EXPORT_SYMBOL(ib_cq_pool_put);

View File

@ -677,8 +677,20 @@ static int add_client_context(struct ib_device *device,
if (ret)
goto out;
downgrade_write(&device->client_data_rwsem);
if (client->add)
client->add(device);
if (client->add) {
if (client->add(device)) {
/*
* If a client fails to add then the error code is
* ignored, but we won't call any more ops on this
* client.
*/
xa_erase(&device->client_data, client->client_id);
up_read(&device->client_data_rwsem);
ib_device_put(device);
ib_client_put(client);
return 0;
}
}
/* Readers shall not see a client until add has been completed */
xa_set_mark(&device->client_data, client->client_id,
@ -1381,6 +1393,7 @@ int ib_register_device(struct ib_device *device, const char *name)
goto dev_cleanup;
}
ib_cq_pool_init(device);
ret = enable_device_and_get(device);
dev_set_uevent_suppress(&device->dev, false);
/* Mark for userspace that device is ready */
@ -1435,6 +1448,7 @@ static void __ib_unregister_device(struct ib_device *ib_dev)
goto out;
disable_device(ib_dev);
ib_cq_pool_destroy(ib_dev);
/* Expedite removing unregistered pointers from the hash table */
free_netdevs(ib_dev);
@ -2557,7 +2571,6 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, add_gid);
SET_DEVICE_OP(dev_ops, advise_mr);
SET_DEVICE_OP(dev_ops, alloc_dm);
SET_DEVICE_OP(dev_ops, alloc_fmr);
SET_DEVICE_OP(dev_ops, alloc_hw_stats);
SET_DEVICE_OP(dev_ops, alloc_mr);
SET_DEVICE_OP(dev_ops, alloc_mr_integrity);
@ -2584,7 +2597,6 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, create_wq);
SET_DEVICE_OP(dev_ops, dealloc_dm);
SET_DEVICE_OP(dev_ops, dealloc_driver);
SET_DEVICE_OP(dev_ops, dealloc_fmr);
SET_DEVICE_OP(dev_ops, dealloc_mw);
SET_DEVICE_OP(dev_ops, dealloc_pd);
SET_DEVICE_OP(dev_ops, dealloc_ucontext);
@ -2628,7 +2640,6 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, iw_rem_ref);
SET_DEVICE_OP(dev_ops, map_mr_sg);
SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
SET_DEVICE_OP(dev_ops, map_phys_fmr);
SET_DEVICE_OP(dev_ops, mmap);
SET_DEVICE_OP(dev_ops, mmap_free);
SET_DEVICE_OP(dev_ops, modify_ah);
@ -2662,7 +2673,6 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, resize_cq);
SET_DEVICE_OP(dev_ops, set_vf_guid);
SET_DEVICE_OP(dev_ops, set_vf_link_state);
SET_DEVICE_OP(dev_ops, unmap_fmr);
SET_OBJ_SIZE(dev_ops, ib_ah);
SET_OBJ_SIZE(dev_ops, ib_cq);

View File

@ -1,494 +0,0 @@
/*
* Copyright (c) 2004 Topspin Communications. All rights reserved.
* Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include <linux/errno.h>
#include <linux/spinlock.h>
#include <linux/export.h>
#include <linux/slab.h>
#include <linux/jhash.h>
#include <linux/kthread.h>
#include <rdma/ib_fmr_pool.h>
#include "core_priv.h"
#define PFX "fmr_pool: "
enum {
IB_FMR_MAX_REMAPS = 32,
IB_FMR_HASH_BITS = 8,
IB_FMR_HASH_SIZE = 1 << IB_FMR_HASH_BITS,
IB_FMR_HASH_MASK = IB_FMR_HASH_SIZE - 1
};
/*
* If an FMR is not in use, then the list member will point to either
* its pool's free_list (if the FMR can be mapped again; that is,
* remap_count < pool->max_remaps) or its pool's dirty_list (if the
* FMR needs to be unmapped before being remapped). In either of
* these cases it is a bug if the ref_count is not 0. In other words,
* if ref_count is > 0, then the list member must not be linked into
* either free_list or dirty_list.
*
* The cache_node member is used to link the FMR into a cache bucket
* (if caching is enabled). This is independent of the reference
* count of the FMR. When a valid FMR is released, its ref_count is
* decremented, and if ref_count reaches 0, the FMR is placed in
* either free_list or dirty_list as appropriate. However, it is not
* removed from the cache and may be "revived" if a call to
* ib_fmr_register_physical() occurs before the FMR is remapped. In
* this case we just increment the ref_count and remove the FMR from
* free_list/dirty_list.
*
* Before we remap an FMR from free_list, we remove it from the cache
* (to prevent another user from obtaining a stale FMR). When an FMR
* is released, we add it to the tail of the free list, so that our
* cache eviction policy is "least recently used."
*
* All manipulation of ref_count, list and cache_node is protected by
* pool_lock to maintain consistency.
*/
struct ib_fmr_pool {
spinlock_t pool_lock;
int pool_size;
int max_pages;
int max_remaps;
int dirty_watermark;
int dirty_len;
struct list_head free_list;
struct list_head dirty_list;
struct hlist_head *cache_bucket;
void (*flush_function)(struct ib_fmr_pool *pool,
void * arg);
void *flush_arg;
struct kthread_worker *worker;
struct kthread_work work;
atomic_t req_ser;
atomic_t flush_ser;
wait_queue_head_t force_wait;
};
static inline u32 ib_fmr_hash(u64 first_page)
{
return jhash_2words((u32) first_page, (u32) (first_page >> 32), 0) &
(IB_FMR_HASH_SIZE - 1);
}
/* Caller must hold pool_lock */
static inline struct ib_pool_fmr *ib_fmr_cache_lookup(struct ib_fmr_pool *pool,
u64 *page_list,
int page_list_len,
u64 io_virtual_address)
{
struct hlist_head *bucket;
struct ib_pool_fmr *fmr;
if (!pool->cache_bucket)
return NULL;
bucket = pool->cache_bucket + ib_fmr_hash(*page_list);
hlist_for_each_entry(fmr, bucket, cache_node)
if (io_virtual_address == fmr->io_virtual_address &&
page_list_len == fmr->page_list_len &&
!memcmp(page_list, fmr->page_list,
page_list_len * sizeof *page_list))
return fmr;
return NULL;
}
static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
{
int ret;
struct ib_pool_fmr *fmr;
LIST_HEAD(unmap_list);
LIST_HEAD(fmr_list);
spin_lock_irq(&pool->pool_lock);
list_for_each_entry(fmr, &pool->dirty_list, list) {
hlist_del_init(&fmr->cache_node);
fmr->remap_count = 0;
list_add_tail(&fmr->fmr->list, &fmr_list);
}
list_splice_init(&pool->dirty_list, &unmap_list);
pool->dirty_len = 0;
spin_unlock_irq(&pool->pool_lock);
if (list_empty(&unmap_list)) {
return;
}
ret = ib_unmap_fmr(&fmr_list);
if (ret)
pr_warn(PFX "ib_unmap_fmr returned %d\n", ret);
spin_lock_irq(&pool->pool_lock);
list_splice(&unmap_list, &pool->free_list);
spin_unlock_irq(&pool->pool_lock);
}
static void ib_fmr_cleanup_func(struct kthread_work *work)
{
struct ib_fmr_pool *pool = container_of(work, struct ib_fmr_pool, work);
ib_fmr_batch_release(pool);
atomic_inc(&pool->flush_ser);
wake_up_interruptible(&pool->force_wait);
if (pool->flush_function)
pool->flush_function(pool, pool->flush_arg);
if (atomic_read(&pool->flush_ser) - atomic_read(&pool->req_ser) < 0)
kthread_queue_work(pool->worker, &pool->work);
}
/**
* ib_create_fmr_pool - Create an FMR pool
* @pd:Protection domain for FMRs
* @params:FMR pool parameters
*
* Create a pool of FMRs. Return value is pointer to new pool or
* error code if creation failed.
*/
struct ib_fmr_pool *ib_create_fmr_pool(struct ib_pd *pd,
struct ib_fmr_pool_param *params)
{
struct ib_device *device;
struct ib_fmr_pool *pool;
int i;
int ret;
int max_remaps;
if (!params)
return ERR_PTR(-EINVAL);
device = pd->device;
if (!device->ops.alloc_fmr || !device->ops.dealloc_fmr ||
!device->ops.map_phys_fmr || !device->ops.unmap_fmr) {
dev_info(&device->dev, "Device does not support FMRs\n");
return ERR_PTR(-ENOSYS);
}
if (!device->attrs.max_map_per_fmr)
max_remaps = IB_FMR_MAX_REMAPS;
else
max_remaps = device->attrs.max_map_per_fmr;
pool = kmalloc(sizeof *pool, GFP_KERNEL);
if (!pool)
return ERR_PTR(-ENOMEM);
pool->cache_bucket = NULL;
pool->flush_function = params->flush_function;
pool->flush_arg = params->flush_arg;
INIT_LIST_HEAD(&pool->free_list);
INIT_LIST_HEAD(&pool->dirty_list);
if (params->cache) {
pool->cache_bucket =
kmalloc_array(IB_FMR_HASH_SIZE,
sizeof(*pool->cache_bucket),
GFP_KERNEL);
if (!pool->cache_bucket) {
ret = -ENOMEM;
goto out_free_pool;
}
for (i = 0; i < IB_FMR_HASH_SIZE; ++i)
INIT_HLIST_HEAD(pool->cache_bucket + i);
}
pool->pool_size = 0;
pool->max_pages = params->max_pages_per_fmr;
pool->max_remaps = max_remaps;
pool->dirty_watermark = params->dirty_watermark;
pool->dirty_len = 0;
spin_lock_init(&pool->pool_lock);
atomic_set(&pool->req_ser, 0);
atomic_set(&pool->flush_ser, 0);
init_waitqueue_head(&pool->force_wait);
pool->worker =
kthread_create_worker(0, "ib_fmr(%s)", dev_name(&device->dev));
if (IS_ERR(pool->worker)) {
pr_warn(PFX "couldn't start cleanup kthread worker\n");
ret = PTR_ERR(pool->worker);
goto out_free_pool;
}
kthread_init_work(&pool->work, ib_fmr_cleanup_func);
{
struct ib_pool_fmr *fmr;
struct ib_fmr_attr fmr_attr = {
.max_pages = params->max_pages_per_fmr,
.max_maps = pool->max_remaps,
.page_shift = params->page_shift
};
int bytes_per_fmr = sizeof *fmr;
if (pool->cache_bucket)
bytes_per_fmr += params->max_pages_per_fmr * sizeof (u64);
for (i = 0; i < params->pool_size; ++i) {
fmr = kmalloc(bytes_per_fmr, GFP_KERNEL);
if (!fmr)
goto out_fail;
fmr->pool = pool;
fmr->remap_count = 0;
fmr->ref_count = 0;
INIT_HLIST_NODE(&fmr->cache_node);
fmr->fmr = ib_alloc_fmr(pd, params->access, &fmr_attr);
if (IS_ERR(fmr->fmr)) {
pr_warn(PFX "fmr_create failed for FMR %d\n",
i);
kfree(fmr);
goto out_fail;
}
list_add_tail(&fmr->list, &pool->free_list);
++pool->pool_size;
}
}
return pool;
out_free_pool:
kfree(pool->cache_bucket);
kfree(pool);
return ERR_PTR(ret);
out_fail:
ib_destroy_fmr_pool(pool);
return ERR_PTR(-ENOMEM);
}
EXPORT_SYMBOL(ib_create_fmr_pool);
/**
* ib_destroy_fmr_pool - Free FMR pool
* @pool:FMR pool to free
*
* Destroy an FMR pool and free all associated resources.
*/
void ib_destroy_fmr_pool(struct ib_fmr_pool *pool)
{
struct ib_pool_fmr *fmr;
struct ib_pool_fmr *tmp;
LIST_HEAD(fmr_list);
int i;
kthread_destroy_worker(pool->worker);
ib_fmr_batch_release(pool);
i = 0;
list_for_each_entry_safe(fmr, tmp, &pool->free_list, list) {
if (fmr->remap_count) {
INIT_LIST_HEAD(&fmr_list);
list_add_tail(&fmr->fmr->list, &fmr_list);
ib_unmap_fmr(&fmr_list);
}
ib_dealloc_fmr(fmr->fmr);
list_del(&fmr->list);
kfree(fmr);
++i;
}
if (i < pool->pool_size)
pr_warn(PFX "pool still has %d regions registered\n",
pool->pool_size - i);
kfree(pool->cache_bucket);
kfree(pool);
}
EXPORT_SYMBOL(ib_destroy_fmr_pool);
/**
* ib_flush_fmr_pool - Invalidate all unmapped FMRs
* @pool:FMR pool to flush
*
* Ensure that all unmapped FMRs are fully invalidated.
*/
int ib_flush_fmr_pool(struct ib_fmr_pool *pool)
{
int serial;
struct ib_pool_fmr *fmr, *next;
/*
* The free_list holds FMRs that may have been used
* but have not been remapped enough times to be dirty.
* Put them on the dirty list now so that the cleanup
* thread will reap them too.
*/
spin_lock_irq(&pool->pool_lock);
list_for_each_entry_safe(fmr, next, &pool->free_list, list) {
if (fmr->remap_count > 0)
list_move(&fmr->list, &pool->dirty_list);
}
spin_unlock_irq(&pool->pool_lock);
serial = atomic_inc_return(&pool->req_ser);
kthread_queue_work(pool->worker, &pool->work);
if (wait_event_interruptible(pool->force_wait,
atomic_read(&pool->flush_ser) - serial >= 0))
return -EINTR;
return 0;
}
EXPORT_SYMBOL(ib_flush_fmr_pool);
/**
* ib_fmr_pool_map_phys - Map an FMR from an FMR pool.
* @pool_handle: FMR pool to allocate FMR from
* @page_list: List of pages to map
* @list_len: Number of pages in @page_list
* @io_virtual_address: I/O virtual address for new FMR
*/
struct ib_pool_fmr *ib_fmr_pool_map_phys(struct ib_fmr_pool *pool_handle,
u64 *page_list,
int list_len,
u64 io_virtual_address)
{
struct ib_fmr_pool *pool = pool_handle;
struct ib_pool_fmr *fmr;
unsigned long flags;
int result;
if (list_len < 1 || list_len > pool->max_pages)
return ERR_PTR(-EINVAL);
spin_lock_irqsave(&pool->pool_lock, flags);
fmr = ib_fmr_cache_lookup(pool,
page_list,
list_len,
io_virtual_address);
if (fmr) {
/* found in cache */
++fmr->ref_count;
if (fmr->ref_count == 1) {
list_del(&fmr->list);
}
spin_unlock_irqrestore(&pool->pool_lock, flags);
return fmr;
}
if (list_empty(&pool->free_list)) {
spin_unlock_irqrestore(&pool->pool_lock, flags);
return ERR_PTR(-EAGAIN);
}
fmr = list_entry(pool->free_list.next, struct ib_pool_fmr, list);
list_del(&fmr->list);
hlist_del_init(&fmr->cache_node);
spin_unlock_irqrestore(&pool->pool_lock, flags);
result = ib_map_phys_fmr(fmr->fmr, page_list, list_len,
io_virtual_address);
if (result) {
spin_lock_irqsave(&pool->pool_lock, flags);
list_add(&fmr->list, &pool->free_list);
spin_unlock_irqrestore(&pool->pool_lock, flags);
pr_warn(PFX "fmr_map returns %d\n", result);
return ERR_PTR(result);
}
++fmr->remap_count;
fmr->ref_count = 1;
if (pool->cache_bucket) {
fmr->io_virtual_address = io_virtual_address;
fmr->page_list_len = list_len;
memcpy(fmr->page_list, page_list, list_len * sizeof(*page_list));
spin_lock_irqsave(&pool->pool_lock, flags);
hlist_add_head(&fmr->cache_node,
pool->cache_bucket + ib_fmr_hash(fmr->page_list[0]));
spin_unlock_irqrestore(&pool->pool_lock, flags);
}
return fmr;
}
EXPORT_SYMBOL(ib_fmr_pool_map_phys);
/**
* ib_fmr_pool_unmap - Unmap FMR
* @fmr:FMR to unmap
*
* Unmap an FMR. The FMR mapping may remain valid until the FMR is
* reused (or until ib_flush_fmr_pool() is called).
*/
void ib_fmr_pool_unmap(struct ib_pool_fmr *fmr)
{
struct ib_fmr_pool *pool;
unsigned long flags;
pool = fmr->pool;
spin_lock_irqsave(&pool->pool_lock, flags);
--fmr->ref_count;
if (!fmr->ref_count) {
if (fmr->remap_count < pool->max_remaps) {
list_add_tail(&fmr->list, &pool->free_list);
} else {
list_add_tail(&fmr->list, &pool->dirty_list);
if (++pool->dirty_len >= pool->dirty_watermark) {
atomic_inc(&pool->req_ser);
kthread_queue_work(pool->worker, &pool->work);
}
}
}
spin_unlock_irqrestore(&pool->pool_lock, flags);
}
EXPORT_SYMBOL(ib_fmr_pool_unmap);

View File

@ -0,0 +1,138 @@
// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
/*
* Copyright (c) 2020 Mellanox Technologies. All rights reserved.
*/
#include <rdma/ib_verbs.h>
#include <rdma/ib_cache.h>
#include <rdma/lag.h>
static struct sk_buff *rdma_build_skb(struct ib_device *device,
struct net_device *netdev,
struct rdma_ah_attr *ah_attr,
gfp_t flags)
{
struct ipv6hdr *ip6h;
struct sk_buff *skb;
struct ethhdr *eth;
struct iphdr *iph;
struct udphdr *uh;
u8 smac[ETH_ALEN];
bool is_ipv4;
int hdr_len;
is_ipv4 = ipv6_addr_v4mapped((struct in6_addr *)ah_attr->grh.dgid.raw);
hdr_len = ETH_HLEN + sizeof(struct udphdr) + LL_RESERVED_SPACE(netdev);
hdr_len += is_ipv4 ? sizeof(struct iphdr) : sizeof(struct ipv6hdr);
skb = alloc_skb(hdr_len, flags);
if (!skb)
return NULL;
skb->dev = netdev;
skb_reserve(skb, hdr_len);
skb_push(skb, sizeof(struct udphdr));
skb_reset_transport_header(skb);
uh = udp_hdr(skb);
uh->source =
htons(rdma_flow_label_to_udp_sport(ah_attr->grh.flow_label));
uh->dest = htons(ROCE_V2_UDP_DPORT);
uh->len = htons(sizeof(struct udphdr));
if (is_ipv4) {
skb_push(skb, sizeof(struct iphdr));
skb_reset_network_header(skb);
iph = ip_hdr(skb);
iph->frag_off = 0;
iph->version = 4;
iph->protocol = IPPROTO_UDP;
iph->ihl = 0x5;
iph->tot_len = htons(sizeof(struct udphdr) + sizeof(struct
iphdr));
memcpy(&iph->saddr, ah_attr->grh.sgid_attr->gid.raw + 12,
sizeof(struct in_addr));
memcpy(&iph->daddr, ah_attr->grh.dgid.raw + 12,
sizeof(struct in_addr));
} else {
skb_push(skb, sizeof(struct ipv6hdr));
skb_reset_network_header(skb);
ip6h = ipv6_hdr(skb);
ip6h->version = 6;
ip6h->nexthdr = IPPROTO_UDP;
memcpy(&ip6h->flow_lbl, &ah_attr->grh.flow_label,
sizeof(*ip6h->flow_lbl));
memcpy(&ip6h->saddr, ah_attr->grh.sgid_attr->gid.raw,
sizeof(struct in6_addr));
memcpy(&ip6h->daddr, ah_attr->grh.dgid.raw,
sizeof(struct in6_addr));
}
skb_push(skb, sizeof(struct ethhdr));
skb_reset_mac_header(skb);
eth = eth_hdr(skb);
skb->protocol = eth->h_proto = htons(is_ipv4 ? ETH_P_IP : ETH_P_IPV6);
rdma_read_gid_l2_fields(ah_attr->grh.sgid_attr, NULL, smac);
memcpy(eth->h_source, smac, ETH_ALEN);
memcpy(eth->h_dest, ah_attr->roce.dmac, ETH_ALEN);
return skb;
}
static struct net_device *rdma_get_xmit_slave_udp(struct ib_device *device,
struct net_device *master,
struct rdma_ah_attr *ah_attr,
gfp_t flags)
{
struct net_device *slave;
struct sk_buff *skb;
skb = rdma_build_skb(device, master, ah_attr, flags);
if (!skb)
return ERR_PTR(-ENOMEM);
rcu_read_lock();
slave = netdev_get_xmit_slave(master, skb,
!!(device->lag_flags &
RDMA_LAG_FLAGS_HASH_ALL_SLAVES));
if (slave)
dev_hold(slave);
rcu_read_unlock();
kfree_skb(skb);
return slave;
}
void rdma_lag_put_ah_roce_slave(struct net_device *xmit_slave)
{
if (xmit_slave)
dev_put(xmit_slave);
}
struct net_device *rdma_lag_get_ah_roce_slave(struct ib_device *device,
struct rdma_ah_attr *ah_attr,
gfp_t flags)
{
struct net_device *slave = NULL;
struct net_device *master;
if (!(ah_attr->type == RDMA_AH_ATTR_TYPE_ROCE &&
ah_attr->grh.sgid_attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP &&
ah_attr->grh.flow_label))
return NULL;
rcu_read_lock();
master = rdma_read_gid_attr_ndev_rcu(ah_attr->grh.sgid_attr);
if (IS_ERR(master)) {
rcu_read_unlock();
return master;
}
dev_hold(master);
rcu_read_unlock();
if (!netif_is_bond_master(master))
goto put;
slave = rdma_get_xmit_slave_udp(device, master, ah_attr, flags);
put:
dev_put(master);
return slave;
}

View File

@ -85,7 +85,6 @@ MODULE_PARM_DESC(send_queue_size, "Size of send queue in number of work requests
module_param_named(recv_queue_size, mad_recvq_size, int, 0444);
MODULE_PARM_DESC(recv_queue_size, "Size of receive queue in number of work requests");
/* Client ID 0 is used for snoop-only clients */
static DEFINE_XARRAY_ALLOC1(ib_mad_clients);
static u32 ib_mad_client_next;
static struct list_head ib_mad_port_list;
@ -483,141 +482,12 @@ error1:
}
EXPORT_SYMBOL(ib_register_mad_agent);
static inline int is_snooping_sends(int mad_snoop_flags)
{
return (mad_snoop_flags &
(/*IB_MAD_SNOOP_POSTED_SENDS |
IB_MAD_SNOOP_RMPP_SENDS |*/
IB_MAD_SNOOP_SEND_COMPLETIONS /*|
IB_MAD_SNOOP_RMPP_SEND_COMPLETIONS*/));
}
static inline int is_snooping_recvs(int mad_snoop_flags)
{
return (mad_snoop_flags &
(IB_MAD_SNOOP_RECVS /*|
IB_MAD_SNOOP_RMPP_RECVS*/));
}
static int register_snoop_agent(struct ib_mad_qp_info *qp_info,
struct ib_mad_snoop_private *mad_snoop_priv)
{
struct ib_mad_snoop_private **new_snoop_table;
unsigned long flags;
int i;
spin_lock_irqsave(&qp_info->snoop_lock, flags);
/* Check for empty slot in array. */
for (i = 0; i < qp_info->snoop_table_size; i++)
if (!qp_info->snoop_table[i])
break;
if (i == qp_info->snoop_table_size) {
/* Grow table. */
new_snoop_table = krealloc(qp_info->snoop_table,
sizeof mad_snoop_priv *
(qp_info->snoop_table_size + 1),
GFP_ATOMIC);
if (!new_snoop_table) {
i = -ENOMEM;
goto out;
}
qp_info->snoop_table = new_snoop_table;
qp_info->snoop_table_size++;
}
qp_info->snoop_table[i] = mad_snoop_priv;
atomic_inc(&qp_info->snoop_count);
out:
spin_unlock_irqrestore(&qp_info->snoop_lock, flags);
return i;
}
struct ib_mad_agent *ib_register_mad_snoop(struct ib_device *device,
u8 port_num,
enum ib_qp_type qp_type,
int mad_snoop_flags,
ib_mad_snoop_handler snoop_handler,
ib_mad_recv_handler recv_handler,
void *context)
{
struct ib_mad_port_private *port_priv;
struct ib_mad_agent *ret;
struct ib_mad_snoop_private *mad_snoop_priv;
int qpn;
int err;
/* Validate parameters */
if ((is_snooping_sends(mad_snoop_flags) && !snoop_handler) ||
(is_snooping_recvs(mad_snoop_flags) && !recv_handler)) {
ret = ERR_PTR(-EINVAL);
goto error1;
}
qpn = get_spl_qp_index(qp_type);
if (qpn == -1) {
ret = ERR_PTR(-EINVAL);
goto error1;
}
port_priv = ib_get_mad_port(device, port_num);
if (!port_priv) {
ret = ERR_PTR(-ENODEV);
goto error1;
}
/* Allocate structures */
mad_snoop_priv = kzalloc(sizeof *mad_snoop_priv, GFP_KERNEL);
if (!mad_snoop_priv) {
ret = ERR_PTR(-ENOMEM);
goto error1;
}
/* Now, fill in the various structures */
mad_snoop_priv->qp_info = &port_priv->qp_info[qpn];
mad_snoop_priv->agent.device = device;
mad_snoop_priv->agent.recv_handler = recv_handler;
mad_snoop_priv->agent.snoop_handler = snoop_handler;
mad_snoop_priv->agent.context = context;
mad_snoop_priv->agent.qp = port_priv->qp_info[qpn].qp;
mad_snoop_priv->agent.port_num = port_num;
mad_snoop_priv->mad_snoop_flags = mad_snoop_flags;
init_completion(&mad_snoop_priv->comp);
err = ib_mad_agent_security_setup(&mad_snoop_priv->agent, qp_type);
if (err) {
ret = ERR_PTR(err);
goto error2;
}
mad_snoop_priv->snoop_index = register_snoop_agent(
&port_priv->qp_info[qpn],
mad_snoop_priv);
if (mad_snoop_priv->snoop_index < 0) {
ret = ERR_PTR(mad_snoop_priv->snoop_index);
goto error3;
}
atomic_set(&mad_snoop_priv->refcount, 1);
return &mad_snoop_priv->agent;
error3:
ib_mad_agent_security_cleanup(&mad_snoop_priv->agent);
error2:
kfree(mad_snoop_priv);
error1:
return ret;
}
EXPORT_SYMBOL(ib_register_mad_snoop);
static inline void deref_mad_agent(struct ib_mad_agent_private *mad_agent_priv)
{
if (atomic_dec_and_test(&mad_agent_priv->refcount))
complete(&mad_agent_priv->comp);
}
static inline void deref_snoop_agent(struct ib_mad_snoop_private *mad_snoop_priv)
{
if (atomic_dec_and_test(&mad_snoop_priv->refcount))
complete(&mad_snoop_priv->comp);
}
static void unregister_mad_agent(struct ib_mad_agent_private *mad_agent_priv)
{
struct ib_mad_port_private *port_priv;
@ -650,25 +520,6 @@ static void unregister_mad_agent(struct ib_mad_agent_private *mad_agent_priv)
kfree_rcu(mad_agent_priv, rcu);
}
static void unregister_mad_snoop(struct ib_mad_snoop_private *mad_snoop_priv)
{
struct ib_mad_qp_info *qp_info;
unsigned long flags;
qp_info = mad_snoop_priv->qp_info;
spin_lock_irqsave(&qp_info->snoop_lock, flags);
qp_info->snoop_table[mad_snoop_priv->snoop_index] = NULL;
atomic_dec(&qp_info->snoop_count);
spin_unlock_irqrestore(&qp_info->snoop_lock, flags);
deref_snoop_agent(mad_snoop_priv);
wait_for_completion(&mad_snoop_priv->comp);
ib_mad_agent_security_cleanup(&mad_snoop_priv->agent);
kfree(mad_snoop_priv);
}
/*
* ib_unregister_mad_agent - Unregisters a client from using MAD services
*
@ -677,20 +528,11 @@ static void unregister_mad_snoop(struct ib_mad_snoop_private *mad_snoop_priv)
void ib_unregister_mad_agent(struct ib_mad_agent *mad_agent)
{
struct ib_mad_agent_private *mad_agent_priv;
struct ib_mad_snoop_private *mad_snoop_priv;
/* If the TID is zero, the agent can only snoop. */
if (mad_agent->hi_tid) {
mad_agent_priv = container_of(mad_agent,
struct ib_mad_agent_private,
agent);
unregister_mad_agent(mad_agent_priv);
} else {
mad_snoop_priv = container_of(mad_agent,
struct ib_mad_snoop_private,
agent);
unregister_mad_snoop(mad_snoop_priv);
}
mad_agent_priv = container_of(mad_agent,
struct ib_mad_agent_private,
agent);
unregister_mad_agent(mad_agent_priv);
}
EXPORT_SYMBOL(ib_unregister_mad_agent);
@ -706,57 +548,6 @@ static void dequeue_mad(struct ib_mad_list_head *mad_list)
spin_unlock_irqrestore(&mad_queue->lock, flags);
}
static void snoop_send(struct ib_mad_qp_info *qp_info,
struct ib_mad_send_buf *send_buf,
struct ib_mad_send_wc *mad_send_wc,
int mad_snoop_flags)
{
struct ib_mad_snoop_private *mad_snoop_priv;
unsigned long flags;
int i;
spin_lock_irqsave(&qp_info->snoop_lock, flags);
for (i = 0; i < qp_info->snoop_table_size; i++) {
mad_snoop_priv = qp_info->snoop_table[i];
if (!mad_snoop_priv ||
!(mad_snoop_priv->mad_snoop_flags & mad_snoop_flags))
continue;
atomic_inc(&mad_snoop_priv->refcount);
spin_unlock_irqrestore(&qp_info->snoop_lock, flags);
mad_snoop_priv->agent.snoop_handler(&mad_snoop_priv->agent,
send_buf, mad_send_wc);
deref_snoop_agent(mad_snoop_priv);
spin_lock_irqsave(&qp_info->snoop_lock, flags);
}
spin_unlock_irqrestore(&qp_info->snoop_lock, flags);
}
static void snoop_recv(struct ib_mad_qp_info *qp_info,
struct ib_mad_recv_wc *mad_recv_wc,
int mad_snoop_flags)
{
struct ib_mad_snoop_private *mad_snoop_priv;
unsigned long flags;
int i;
spin_lock_irqsave(&qp_info->snoop_lock, flags);
for (i = 0; i < qp_info->snoop_table_size; i++) {
mad_snoop_priv = qp_info->snoop_table[i];
if (!mad_snoop_priv ||
!(mad_snoop_priv->mad_snoop_flags & mad_snoop_flags))
continue;
atomic_inc(&mad_snoop_priv->refcount);
spin_unlock_irqrestore(&qp_info->snoop_lock, flags);
mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent, NULL,
mad_recv_wc);
deref_snoop_agent(mad_snoop_priv);
spin_lock_irqsave(&qp_info->snoop_lock, flags);
}
spin_unlock_irqrestore(&qp_info->snoop_lock, flags);
}
static void build_smp_wc(struct ib_qp *qp, struct ib_cqe *cqe, u16 slid,
u16 pkey_index, u8 port_num, struct ib_wc *wc)
{
@ -2289,9 +2080,6 @@ static void ib_mad_recv_done(struct ib_cq *cq, struct ib_wc *wc)
recv->header.recv_wc.recv_buf.mad = (struct ib_mad *)recv->mad;
recv->header.recv_wc.recv_buf.grh = &recv->grh;
if (atomic_read(&qp_info->snoop_count))
snoop_recv(qp_info, &recv->header.recv_wc, IB_MAD_SNOOP_RECVS);
/* Validate MAD */
if (!validate_mad((const struct ib_mad_hdr *)recv->mad, qp_info, opa))
goto out;
@ -2538,9 +2326,6 @@ retry:
mad_send_wc.send_buf = &mad_send_wr->send_buf;
mad_send_wc.status = wc->status;
mad_send_wc.vendor_err = wc->vendor_err;
if (atomic_read(&qp_info->snoop_count))
snoop_send(qp_info, &mad_send_wr->send_buf, &mad_send_wc,
IB_MAD_SNOOP_SEND_COMPLETIONS);
ib_mad_complete_send_wr(mad_send_wr, &mad_send_wc);
if (queued_send_wr) {
@ -2782,10 +2567,6 @@ static void local_completions(struct work_struct *work)
local->mad_priv->header.recv_wc.recv_buf.grh = NULL;
local->mad_priv->header.recv_wc.recv_buf.mad =
(struct ib_mad *)local->mad_priv->mad;
if (atomic_read(&recv_mad_agent->qp_info->snoop_count))
snoop_recv(recv_mad_agent->qp_info,
&local->mad_priv->header.recv_wc,
IB_MAD_SNOOP_RECVS);
recv_mad_agent->agent.recv_handler(
&recv_mad_agent->agent,
&local->mad_send_wr->send_buf,
@ -2800,10 +2581,6 @@ local_send_completion:
mad_send_wc.status = IB_WC_SUCCESS;
mad_send_wc.vendor_err = 0;
mad_send_wc.send_buf = &local->mad_send_wr->send_buf;
if (atomic_read(&mad_agent_priv->qp_info->snoop_count))
snoop_send(mad_agent_priv->qp_info,
&local->mad_send_wr->send_buf,
&mad_send_wc, IB_MAD_SNOOP_SEND_COMPLETIONS);
mad_agent_priv->agent.send_handler(&mad_agent_priv->agent,
&mad_send_wc);
@ -3119,10 +2896,6 @@ static void init_mad_qp(struct ib_mad_port_private *port_priv,
init_mad_queue(qp_info, &qp_info->send_queue);
init_mad_queue(qp_info, &qp_info->recv_queue);
INIT_LIST_HEAD(&qp_info->overflow_list);
spin_lock_init(&qp_info->snoop_lock);
qp_info->snoop_table = NULL;
qp_info->snoop_table_size = 0;
atomic_set(&qp_info->snoop_count, 0);
}
static int create_mad_qp(struct ib_mad_qp_info *qp_info,
@ -3166,7 +2939,6 @@ static void destroy_mad_qp(struct ib_mad_qp_info *qp_info)
return;
ib_destroy_qp(qp_info->qp);
kfree(qp_info->snoop_table);
}
/*
@ -3304,9 +3076,11 @@ static int ib_mad_port_close(struct ib_device *device, int port_num)
return 0;
}
static void ib_mad_init_device(struct ib_device *device)
static int ib_mad_init_device(struct ib_device *device)
{
int start, i;
unsigned int count = 0;
int ret;
start = rdma_start_port(device);
@ -3314,17 +3088,23 @@ static void ib_mad_init_device(struct ib_device *device)
if (!rdma_cap_ib_mad(device, i))
continue;
if (ib_mad_port_open(device, i)) {
ret = ib_mad_port_open(device, i);
if (ret) {
dev_err(&device->dev, "Couldn't open port %d\n", i);
goto error;
}
if (ib_agent_port_open(device, i)) {
ret = ib_agent_port_open(device, i);
if (ret) {
dev_err(&device->dev,
"Couldn't open port %d for agents\n", i);
goto error_agent;
}
count++;
}
return;
if (!count)
return -EOPNOTSUPP;
return 0;
error_agent:
if (ib_mad_port_close(device, i))
@ -3341,6 +3121,7 @@ error:
if (ib_mad_port_close(device, i))
dev_err(&device->dev, "Couldn't close port %d\n", i);
}
return ret;
}
static void ib_mad_remove_device(struct ib_device *device, void *client_data)

View File

@ -42,7 +42,7 @@
#include <rdma/ib_cache.h>
#include "sa.h"
static void mcast_add_one(struct ib_device *device);
static int mcast_add_one(struct ib_device *device);
static void mcast_remove_one(struct ib_device *device, void *client_data);
static struct ib_client mcast_client = {
@ -815,7 +815,7 @@ static void mcast_event_handler(struct ib_event_handler *handler,
}
}
static void mcast_add_one(struct ib_device *device)
static int mcast_add_one(struct ib_device *device)
{
struct mcast_device *dev;
struct mcast_port *port;
@ -825,7 +825,7 @@ static void mcast_add_one(struct ib_device *device)
dev = kmalloc(struct_size(dev, port, device->phys_port_cnt),
GFP_KERNEL);
if (!dev)
return;
return -ENOMEM;
dev->start_port = rdma_start_port(device);
dev->end_port = rdma_end_port(device);
@ -845,7 +845,7 @@ static void mcast_add_one(struct ib_device *device)
if (!count) {
kfree(dev);
return;
return -EOPNOTSUPP;
}
dev->device = device;
@ -853,6 +853,7 @@ static void mcast_add_one(struct ib_device *device)
INIT_IB_EVENT_HANDLER(&dev->event_handler, device, mcast_event_handler);
ib_register_event_handler(&dev->event_handler);
return 0;
}
static void mcast_remove_one(struct ib_device *device, void *client_data)
@ -861,9 +862,6 @@ static void mcast_remove_one(struct ib_device *device, void *client_data)
struct mcast_port *port;
int i;
if (!dev)
return;
ib_unregister_event_handler(&dev->event_handler);
flush_workqueue(mcast_wq);

View File

@ -130,6 +130,17 @@ static int uverbs_destroy_uobject(struct ib_uobject *uobj,
lockdep_assert_held(&ufile->hw_destroy_rwsem);
assert_uverbs_usecnt(uobj, UVERBS_LOOKUP_WRITE);
if (reason == RDMA_REMOVE_ABORT_HWOBJ) {
reason = RDMA_REMOVE_ABORT;
ret = uobj->uapi_object->type_class->destroy_hw(uobj, reason,
attrs);
/*
* Drivers are not permitted to ignore RDMA_REMOVE_ABORT, see
* ib_is_destroy_retryable, cleanup_retryable == false here.
*/
WARN_ON(ret);
}
if (reason == RDMA_REMOVE_ABORT) {
WARN_ON(!list_empty(&uobj->list));
WARN_ON(!uobj->context);
@ -653,11 +664,15 @@ void rdma_alloc_commit_uobject(struct ib_uobject *uobj,
* object and anything else connected to uobj before calling this.
*/
void rdma_alloc_abort_uobject(struct ib_uobject *uobj,
struct uverbs_attr_bundle *attrs)
struct uverbs_attr_bundle *attrs,
bool hw_obj_valid)
{
struct ib_uverbs_file *ufile = uobj->ufile;
uverbs_destroy_uobject(uobj, RDMA_REMOVE_ABORT, attrs);
uverbs_destroy_uobject(uobj,
hw_obj_valid ? RDMA_REMOVE_ABORT_HWOBJ :
RDMA_REMOVE_ABORT,
attrs);
/* Matches the down_read in rdma_alloc_begin_uobject */
up_read(&ufile->hw_destroy_rwsem);
@ -927,8 +942,8 @@ uverbs_get_uobject_from_file(u16 object_id, enum uverbs_obj_access access,
}
void uverbs_finalize_object(struct ib_uobject *uobj,
enum uverbs_obj_access access, bool commit,
struct uverbs_attr_bundle *attrs)
enum uverbs_obj_access access, bool hw_obj_valid,
bool commit, struct uverbs_attr_bundle *attrs)
{
/*
* refcounts should be handled at the object level and not at the
@ -951,7 +966,7 @@ void uverbs_finalize_object(struct ib_uobject *uobj,
if (commit)
rdma_alloc_commit_uobject(uobj, attrs);
else
rdma_alloc_abort_uobject(uobj, attrs);
rdma_alloc_abort_uobject(uobj, attrs, hw_obj_valid);
break;
default:
WARN_ON(true);

View File

@ -64,8 +64,8 @@ uverbs_get_uobject_from_file(u16 object_id, enum uverbs_obj_access access,
s64 id, struct uverbs_attr_bundle *attrs);
void uverbs_finalize_object(struct ib_uobject *uobj,
enum uverbs_obj_access access, bool commit,
struct uverbs_attr_bundle *attrs);
enum uverbs_obj_access access, bool hw_obj_valid,
bool commit, struct uverbs_attr_bundle *attrs);
int uverbs_output_written(const struct uverbs_attr_bundle *bundle, size_t idx);
@ -159,6 +159,9 @@ extern const struct uapi_definition uverbs_def_obj_dm[];
extern const struct uapi_definition uverbs_def_obj_flow_action[];
extern const struct uapi_definition uverbs_def_obj_intf[];
extern const struct uapi_definition uverbs_def_obj_mr[];
extern const struct uapi_definition uverbs_def_obj_qp[];
extern const struct uapi_definition uverbs_def_obj_srq[];
extern const struct uapi_definition uverbs_def_obj_wq[];
extern const struct uapi_definition uverbs_def_write_intf[];
static inline const struct uverbs_api_write_method *

View File

@ -129,7 +129,7 @@ static int rdma_rw_init_mr_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
qp->integrity_en);
int i, j, ret = 0, count = 0;
ctx->nr_ops = (sg_cnt + pages_per_mr - 1) / pages_per_mr;
ctx->nr_ops = DIV_ROUND_UP(sg_cnt, pages_per_mr);
ctx->reg = kcalloc(ctx->nr_ops, sizeof(*ctx->reg), GFP_KERNEL);
if (!ctx->reg) {
ret = -ENOMEM;

View File

@ -174,7 +174,7 @@ static const struct nla_policy ib_nl_policy[LS_NLA_TYPE_MAX] = {
};
static void ib_sa_add_one(struct ib_device *device);
static int ib_sa_add_one(struct ib_device *device);
static void ib_sa_remove_one(struct ib_device *device, void *client_data);
static struct ib_client sa_client = {
@ -190,7 +190,7 @@ static u32 tid;
#define PATH_REC_FIELD(field) \
.struct_offset_bytes = offsetof(struct sa_path_rec, field), \
.struct_size_bytes = sizeof((struct sa_path_rec *)0)->field, \
.struct_size_bytes = sizeof_field(struct sa_path_rec, field), \
.field_name = "sa_path_rec:" #field
static const struct ib_field path_rec_table[] = {
@ -292,7 +292,7 @@ static const struct ib_field path_rec_table[] = {
.struct_offset_bytes = \
offsetof(struct sa_path_rec, field), \
.struct_size_bytes = \
sizeof((struct sa_path_rec *)0)->field, \
sizeof_field(struct sa_path_rec, field), \
.field_name = "sa_path_rec:" #field
static const struct ib_field opa_path_rec_table[] = {
@ -420,7 +420,7 @@ static const struct ib_field opa_path_rec_table[] = {
#define MCMEMBER_REC_FIELD(field) \
.struct_offset_bytes = offsetof(struct ib_sa_mcmember_rec, field), \
.struct_size_bytes = sizeof ((struct ib_sa_mcmember_rec *) 0)->field, \
.struct_size_bytes = sizeof_field(struct ib_sa_mcmember_rec, field), \
.field_name = "sa_mcmember_rec:" #field
static const struct ib_field mcmember_rec_table[] = {
@ -504,7 +504,7 @@ static const struct ib_field mcmember_rec_table[] = {
#define SERVICE_REC_FIELD(field) \
.struct_offset_bytes = offsetof(struct ib_sa_service_rec, field), \
.struct_size_bytes = sizeof ((struct ib_sa_service_rec *) 0)->field, \
.struct_size_bytes = sizeof_field(struct ib_sa_service_rec, field), \
.field_name = "sa_service_rec:" #field
static const struct ib_field service_rec_table[] = {
@ -552,7 +552,7 @@ static const struct ib_field service_rec_table[] = {
#define CLASSPORTINFO_REC_FIELD(field) \
.struct_offset_bytes = offsetof(struct ib_class_port_info, field), \
.struct_size_bytes = sizeof((struct ib_class_port_info *)0)->field, \
.struct_size_bytes = sizeof_field(struct ib_class_port_info, field), \
.field_name = "ib_class_port_info:" #field
static const struct ib_field ib_classport_info_rec_table[] = {
@ -630,7 +630,7 @@ static const struct ib_field ib_classport_info_rec_table[] = {
.struct_offset_bytes =\
offsetof(struct opa_class_port_info, field), \
.struct_size_bytes = \
sizeof((struct opa_class_port_info *)0)->field, \
sizeof_field(struct opa_class_port_info, field), \
.field_name = "opa_class_port_info:" #field
static const struct ib_field opa_classport_info_rec_table[] = {
@ -710,7 +710,7 @@ static const struct ib_field opa_classport_info_rec_table[] = {
#define GUIDINFO_REC_FIELD(field) \
.struct_offset_bytes = offsetof(struct ib_sa_guidinfo_rec, field), \
.struct_size_bytes = sizeof((struct ib_sa_guidinfo_rec *) 0)->field, \
.struct_size_bytes = sizeof_field(struct ib_sa_guidinfo_rec, field), \
.field_name = "sa_guidinfo_rec:" #field
static const struct ib_field guidinfo_rec_table[] = {
@ -1412,17 +1412,13 @@ void ib_sa_pack_path(struct sa_path_rec *rec, void *attribute)
EXPORT_SYMBOL(ib_sa_pack_path);
static bool ib_sa_opa_pathrecord_support(struct ib_sa_client *client,
struct ib_device *device,
struct ib_sa_device *sa_dev,
u8 port_num)
{
struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
struct ib_sa_port *port;
unsigned long flags;
bool ret = false;
if (!sa_dev)
return ret;
port = &sa_dev->port[port_num - sa_dev->start_port];
spin_lock_irqsave(&port->classport_lock, flags);
if (!port->classport_info.valid)
@ -1450,8 +1446,8 @@ enum opa_pr_supported {
* query is possible.
*/
static int opa_pr_query_possible(struct ib_sa_client *client,
struct ib_device *device,
u8 port_num,
struct ib_sa_device *sa_dev,
struct ib_device *device, u8 port_num,
struct sa_path_rec *rec)
{
struct ib_port_attr port_attr;
@ -1459,7 +1455,7 @@ static int opa_pr_query_possible(struct ib_sa_client *client,
if (ib_query_port(device, port_num, &port_attr))
return PR_NOT_SUPPORTED;
if (ib_sa_opa_pathrecord_support(client, device, port_num))
if (ib_sa_opa_pathrecord_support(client, sa_dev, port_num))
return PR_OPA_SUPPORTED;
if (port_attr.lid >= be16_to_cpu(IB_MULTICAST_LID_BASE))
@ -1574,7 +1570,8 @@ int ib_sa_path_rec_get(struct ib_sa_client *client,
query->sa_query.port = port;
if (rec->rec_type == SA_PATH_REC_TYPE_OPA) {
status = opa_pr_query_possible(client, device, port_num, rec);
status = opa_pr_query_possible(client, sa_dev, device, port_num,
rec);
if (status == PR_NOT_SUPPORTED) {
ret = -EINVAL;
goto err1;
@ -2325,18 +2322,19 @@ static void ib_sa_event(struct ib_event_handler *handler,
}
}
static void ib_sa_add_one(struct ib_device *device)
static int ib_sa_add_one(struct ib_device *device)
{
struct ib_sa_device *sa_dev;
int s, e, i;
int count = 0;
int ret;
s = rdma_start_port(device);
e = rdma_end_port(device);
sa_dev = kzalloc(struct_size(sa_dev, port, e - s + 1), GFP_KERNEL);
if (!sa_dev)
return;
return -ENOMEM;
sa_dev->start_port = s;
sa_dev->end_port = e;
@ -2356,8 +2354,10 @@ static void ib_sa_add_one(struct ib_device *device)
ib_register_mad_agent(device, i + s, IB_QPT_GSI,
NULL, 0, send_handler,
recv_handler, sa_dev, 0);
if (IS_ERR(sa_dev->port[i].agent))
if (IS_ERR(sa_dev->port[i].agent)) {
ret = PTR_ERR(sa_dev->port[i].agent);
goto err;
}
INIT_WORK(&sa_dev->port[i].update_task, update_sm_ah);
INIT_DELAYED_WORK(&sa_dev->port[i].ib_cpi_work,
@ -2366,8 +2366,10 @@ static void ib_sa_add_one(struct ib_device *device)
count++;
}
if (!count)
if (!count) {
ret = -EOPNOTSUPP;
goto free;
}
ib_set_client_data(device, &sa_client, sa_dev);
@ -2386,7 +2388,7 @@ static void ib_sa_add_one(struct ib_device *device)
update_sm_ah(&sa_dev->port[i].update_task);
}
return;
return 0;
err:
while (--i >= 0) {
@ -2395,7 +2397,7 @@ err:
}
free:
kfree(sa_dev);
return;
return ret;
}
static void ib_sa_remove_one(struct ib_device *device, void *client_data)
@ -2403,9 +2405,6 @@ static void ib_sa_remove_one(struct ib_device *device, void *client_data)
struct ib_sa_device *sa_dev = client_data;
int i;
if (!sa_dev)
return;
ib_unregister_event_handler(&sa_dev->event_handler);
flush_workqueue(ib_wq);

View File

@ -1058,8 +1058,7 @@ static int add_port(struct ib_core_device *coredev, int port_num)
coredev->ports_kobj,
"%d", port_num);
if (ret) {
kfree(p);
return ret;
goto err_put;
}
p->gid_attr_group = kzalloc(sizeof(*p->gid_attr_group), GFP_KERNEL);
@ -1072,8 +1071,7 @@ static int add_port(struct ib_core_device *coredev, int port_num)
ret = kobject_init_and_add(&p->gid_attr_group->kobj, &gid_attr_type,
&p->kobj, "gid_attrs");
if (ret) {
kfree(p->gid_attr_group);
goto err_put;
goto err_put_gid_attrs;
}
if (device->ops.process_mad && is_full_dev) {
@ -1404,8 +1402,10 @@ int ib_port_register_module_stat(struct ib_device *device, u8 port_num,
ret = kobject_init_and_add(kobj, ktype, &port->kobj, "%s",
name);
if (ret)
if (ret) {
kobject_put(kobj);
return ret;
}
}
return 0;

View File

@ -52,6 +52,7 @@
#include <rdma/rdma_cm_ib.h>
#include <rdma/ib_addr.h>
#include <rdma/ib.h>
#include <rdma/ib_cm.h>
#include <rdma/rdma_netlink.h>
#include "core_priv.h"
@ -360,6 +361,9 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
ucma_copy_conn_event(&uevent->resp.param.conn,
&event->param.conn);
uevent->resp.ece.vendor_id = event->ece.vendor_id;
uevent->resp.ece.attr_mod = event->ece.attr_mod;
if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) {
if (!ctx->backlog) {
ret = -ENOMEM;
@ -404,7 +408,8 @@ static ssize_t ucma_get_event(struct ucma_file *file, const char __user *inbuf,
* Old 32 bit user space does not send the 4 byte padding in the
* reserved field. We don't care, allow it to keep working.
*/
if (out_len < sizeof(uevent->resp) - sizeof(uevent->resp.reserved))
if (out_len < sizeof(uevent->resp) - sizeof(uevent->resp.reserved) -
sizeof(uevent->resp.ece))
return -ENOSPC;
if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
@ -845,7 +850,7 @@ static ssize_t ucma_query_route(struct ucma_file *file,
struct sockaddr *addr;
int ret = 0;
if (out_len < sizeof(resp))
if (out_len < offsetof(struct rdma_ucm_query_route_resp, ibdev_index))
return -ENOSPC;
if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
@ -869,6 +874,7 @@ static ssize_t ucma_query_route(struct ucma_file *file,
goto out;
resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid;
resp.ibdev_index = ctx->cm_id->device->index;
resp.port_num = ctx->cm_id->port_num;
if (rdma_cap_ib_sa(ctx->cm_id->device, ctx->cm_id->port_num))
@ -880,8 +886,8 @@ static ssize_t ucma_query_route(struct ucma_file *file,
out:
mutex_unlock(&ctx->mutex);
if (copy_to_user(u64_to_user_ptr(cmd.response),
&resp, sizeof(resp)))
if (copy_to_user(u64_to_user_ptr(cmd.response), &resp,
min_t(size_t, out_len, sizeof(resp))))
ret = -EFAULT;
ucma_put_ctx(ctx);
@ -895,6 +901,7 @@ static void ucma_query_device_addr(struct rdma_cm_id *cm_id,
return;
resp->node_guid = (__force __u64) cm_id->device->node_guid;
resp->ibdev_index = cm_id->device->index;
resp->port_num = cm_id->port_num;
resp->pkey = (__force __u16) cpu_to_be16(
ib_addr_get_pkey(&cm_id->route.addr.dev_addr));
@ -907,7 +914,7 @@ static ssize_t ucma_query_addr(struct ucma_context *ctx,
struct sockaddr *addr;
int ret = 0;
if (out_len < sizeof(resp))
if (out_len < offsetof(struct rdma_ucm_query_addr_resp, ibdev_index))
return -ENOSPC;
memset(&resp, 0, sizeof resp);
@ -922,7 +929,7 @@ static ssize_t ucma_query_addr(struct ucma_context *ctx,
ucma_query_device_addr(ctx->cm_id, &resp);
if (copy_to_user(response, &resp, sizeof(resp)))
if (copy_to_user(response, &resp, min_t(size_t, out_len, sizeof(resp))))
ret = -EFAULT;
return ret;
@ -974,7 +981,7 @@ static ssize_t ucma_query_gid(struct ucma_context *ctx,
struct sockaddr_ib *addr;
int ret = 0;
if (out_len < sizeof(resp))
if (out_len < offsetof(struct rdma_ucm_query_addr_resp, ibdev_index))
return -ENOSPC;
memset(&resp, 0, sizeof resp);
@ -1007,7 +1014,7 @@ static ssize_t ucma_query_gid(struct ucma_context *ctx,
&ctx->cm_id->route.addr.dst_addr);
}
if (copy_to_user(response, &resp, sizeof(resp)))
if (copy_to_user(response, &resp, min_t(size_t, out_len, sizeof(resp))))
ret = -EFAULT;
return ret;
@ -1070,12 +1077,15 @@ static void ucma_copy_conn_param(struct rdma_cm_id *id,
static ssize_t ucma_connect(struct ucma_file *file, const char __user *inbuf,
int in_len, int out_len)
{
struct rdma_ucm_connect cmd;
struct rdma_conn_param conn_param;
struct rdma_ucm_ece ece = {};
struct rdma_ucm_connect cmd;
struct ucma_context *ctx;
size_t in_size;
int ret;
if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
in_size = min_t(size_t, in_len, sizeof(cmd));
if (copy_from_user(&cmd, inbuf, in_size))
return -EFAULT;
if (!cmd.conn_param.valid)
@ -1086,8 +1096,13 @@ static ssize_t ucma_connect(struct ucma_file *file, const char __user *inbuf,
return PTR_ERR(ctx);
ucma_copy_conn_param(ctx->cm_id, &conn_param, &cmd.conn_param);
if (offsetofend(typeof(cmd), ece) <= in_size) {
ece.vendor_id = cmd.ece.vendor_id;
ece.attr_mod = cmd.ece.attr_mod;
}
mutex_lock(&ctx->mutex);
ret = rdma_connect(ctx->cm_id, &conn_param);
ret = rdma_connect_ece(ctx->cm_id, &conn_param, &ece);
mutex_unlock(&ctx->mutex);
ucma_put_ctx(ctx);
return ret;
@ -1121,28 +1136,36 @@ static ssize_t ucma_accept(struct ucma_file *file, const char __user *inbuf,
{
struct rdma_ucm_accept cmd;
struct rdma_conn_param conn_param;
struct rdma_ucm_ece ece = {};
struct ucma_context *ctx;
size_t in_size;
int ret;
if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
in_size = min_t(size_t, in_len, sizeof(cmd));
if (copy_from_user(&cmd, inbuf, in_size))
return -EFAULT;
ctx = ucma_get_ctx_dev(file, cmd.id);
if (IS_ERR(ctx))
return PTR_ERR(ctx);
if (offsetofend(typeof(cmd), ece) <= in_size) {
ece.vendor_id = cmd.ece.vendor_id;
ece.attr_mod = cmd.ece.attr_mod;
}
if (cmd.conn_param.valid) {
ucma_copy_conn_param(ctx->cm_id, &conn_param, &cmd.conn_param);
mutex_lock(&file->mut);
mutex_lock(&ctx->mutex);
ret = __rdma_accept(ctx->cm_id, &conn_param, NULL);
ret = __rdma_accept_ece(ctx->cm_id, &conn_param, NULL, &ece);
mutex_unlock(&ctx->mutex);
if (!ret)
ctx->uid = cmd.uid;
mutex_unlock(&file->mut);
} else {
mutex_lock(&ctx->mutex);
ret = __rdma_accept(ctx->cm_id, NULL, NULL);
ret = __rdma_accept_ece(ctx->cm_id, NULL, NULL, &ece);
mutex_unlock(&ctx->mutex);
}
ucma_put_ctx(ctx);
@ -1159,12 +1182,24 @@ static ssize_t ucma_reject(struct ucma_file *file, const char __user *inbuf,
if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
return -EFAULT;
if (!cmd.reason)
cmd.reason = IB_CM_REJ_CONSUMER_DEFINED;
switch (cmd.reason) {
case IB_CM_REJ_CONSUMER_DEFINED:
case IB_CM_REJ_VENDOR_OPTION_NOT_SUPPORTED:
break;
default:
return -EINVAL;
}
ctx = ucma_get_ctx_dev(file, cmd.id);
if (IS_ERR(ctx))
return PTR_ERR(ctx);
mutex_lock(&ctx->mutex);
ret = rdma_reject(ctx->cm_id, cmd.private_data, cmd.private_data_len);
ret = rdma_reject(ctx->cm_id, cmd.private_data, cmd.private_data_len,
cmd.reason);
mutex_unlock(&ctx->mutex);
ucma_put_ctx(ctx);
return ret;

View File

@ -41,7 +41,7 @@
#define STRUCT_FIELD(header, field) \
.struct_offset_bytes = offsetof(struct ib_unpacked_ ## header, field), \
.struct_size_bytes = sizeof ((struct ib_unpacked_ ## header *) 0)->field, \
.struct_size_bytes = sizeof_field(struct ib_unpacked_ ## header, field), \
.field_name = #header ":" #field
static const struct ib_field lrh_table[] = {

View File

@ -142,7 +142,7 @@ static dev_t dynamic_issm_dev;
static DEFINE_IDA(umad_ida);
static void ib_umad_add_one(struct ib_device *device);
static int ib_umad_add_one(struct ib_device *device);
static void ib_umad_remove_one(struct ib_device *device, void *client_data);
static void ib_umad_dev_free(struct kref *kref)
@ -1352,37 +1352,41 @@ static void ib_umad_kill_port(struct ib_umad_port *port)
put_device(&port->dev);
}
static void ib_umad_add_one(struct ib_device *device)
static int ib_umad_add_one(struct ib_device *device)
{
struct ib_umad_device *umad_dev;
int s, e, i;
int count = 0;
int ret;
s = rdma_start_port(device);
e = rdma_end_port(device);
umad_dev = kzalloc(struct_size(umad_dev, ports, e - s + 1), GFP_KERNEL);
if (!umad_dev)
return;
return -ENOMEM;
kref_init(&umad_dev->kref);
for (i = s; i <= e; ++i) {
if (!rdma_cap_ib_mad(device, i))
continue;
if (ib_umad_init_port(device, i, umad_dev,
&umad_dev->ports[i - s]))
ret = ib_umad_init_port(device, i, umad_dev,
&umad_dev->ports[i - s]);
if (ret)
goto err;
count++;
}
if (!count)
if (!count) {
ret = -EOPNOTSUPP;
goto free;
}
ib_set_client_data(device, &umad_client, umad_dev);
return;
return 0;
err:
while (--i >= s) {
@ -1394,6 +1398,7 @@ err:
free:
/* balances kref_init */
ib_umad_dev_put(umad_dev);
return ret;
}
static void ib_umad_remove_one(struct ib_device *device, void *client_data)
@ -1401,9 +1406,6 @@ static void ib_umad_remove_one(struct ib_device *device, void *client_data)
struct ib_umad_device *umad_dev = client_data;
unsigned int i;
if (!umad_dev)
return;
rdma_for_each_port (device, i) {
if (rdma_cap_ib_mad(device, i))
ib_umad_kill_port(

View File

@ -142,7 +142,7 @@ struct ib_uverbs_file {
* ucontext_lock held
*/
struct ib_ucontext *ucontext;
struct ib_uverbs_async_event_file *async_file;
struct ib_uverbs_async_event_file *default_async_file;
struct list_head list;
/*
@ -180,6 +180,7 @@ struct ib_uverbs_mcast_entry {
struct ib_uevent_object {
struct ib_uobject uobject;
struct ib_uverbs_async_event_file *event_file;
/* List member for ib_uverbs_async_event_file list */
struct list_head event_list;
u32 events_reported;
@ -296,6 +297,24 @@ static inline u32 make_port_cap_flags(const struct ib_port_attr *attr)
return res;
}
static inline struct ib_uverbs_async_event_file *
ib_uverbs_get_async_event(struct uverbs_attr_bundle *attrs,
u16 id)
{
struct ib_uobject *async_ev_file_uobj;
struct ib_uverbs_async_event_file *async_ev_file;
async_ev_file_uobj = uverbs_attr_get_uobject(attrs, id);
if (IS_ERR(async_ev_file_uobj))
async_ev_file = READ_ONCE(attrs->ufile->default_async_file);
else
async_ev_file = container_of(async_ev_file_uobj,
struct ib_uverbs_async_event_file,
uobj);
if (async_ev_file)
uverbs_uobject_get(&async_ev_file->uobj);
return async_ev_file;
}
void copy_port_attr_to_resp(struct ib_port_attr *attr,
struct ib_uverbs_query_port_resp *resp,

View File

@ -311,7 +311,7 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
return 0;
err_uobj:
rdma_alloc_abort_uobject(uobj, attrs);
rdma_alloc_abort_uobject(uobj, attrs, false);
err_ucontext:
kfree(attrs->context);
attrs->context = NULL;
@ -356,8 +356,6 @@ static void copy_query_dev_fields(struct ib_ucontext *ucontext,
resp->max_mcast_qp_attach = attr->max_mcast_qp_attach;
resp->max_total_mcast_qp_attach = attr->max_total_mcast_qp_attach;
resp->max_ah = attr->max_ah;
resp->max_fmr = attr->max_fmr;
resp->max_map_per_fmr = attr->max_map_per_fmr;
resp->max_srq = attr->max_srq;
resp->max_srq_wr = attr->max_srq_wr;
resp->max_srq_sge = attr->max_srq_sge;
@ -1051,6 +1049,10 @@ static struct ib_ucq_object *create_cq(struct uverbs_attr_bundle *attrs,
goto err_free;
obj->uevent.uobject.object = cq;
obj->uevent.event_file = READ_ONCE(attrs->ufile->default_async_file);
if (obj->uevent.event_file)
uverbs_uobject_get(&obj->uevent.event_file->uobj);
memset(&resp, 0, sizeof resp);
resp.base.cq_handle = obj->uevent.uobject.id;
resp.base.cqe = cq->cqe;
@ -1067,6 +1069,8 @@ static struct ib_ucq_object *create_cq(struct uverbs_attr_bundle *attrs,
return obj;
err_cb:
if (obj->uevent.event_file)
uverbs_uobject_put(&obj->uevent.event_file->uobj);
ib_destroy_cq_user(cq, uverbs_get_cleared_udata(attrs));
cq = NULL;
err_free:
@ -1460,6 +1464,9 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
}
obj->uevent.uobject.object = qp;
obj->uevent.event_file = READ_ONCE(attrs->ufile->default_async_file);
if (obj->uevent.event_file)
uverbs_uobject_get(&obj->uevent.event_file->uobj);
memset(&resp, 0, sizeof resp);
resp.base.qpn = qp->qp_num;
@ -1473,7 +1480,7 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
ret = uverbs_response(attrs, &resp, sizeof(resp));
if (ret)
goto err_cb;
goto err_uevent;
if (xrcd) {
obj->uxrcd = container_of(xrcd_uobj, struct ib_uxrcd_object,
@ -1498,6 +1505,9 @@ static int create_qp(struct uverbs_attr_bundle *attrs,
rdma_alloc_commit_uobject(&obj->uevent.uobject, attrs);
return 0;
err_uevent:
if (obj->uevent.event_file)
uverbs_uobject_put(&obj->uevent.event_file->uobj);
err_cb:
ib_destroy_qp_user(qp, uverbs_get_cleared_udata(attrs));
@ -2954,11 +2964,11 @@ static int ib_uverbs_ex_create_wq(struct uverbs_attr_bundle *attrs)
wq_init_attr.cq = cq;
wq_init_attr.max_sge = cmd.max_sge;
wq_init_attr.max_wr = cmd.max_wr;
wq_init_attr.wq_context = attrs->ufile;
wq_init_attr.wq_type = cmd.wq_type;
wq_init_attr.event_handler = ib_uverbs_wq_event_handler;
wq_init_attr.create_flags = cmd.create_flags;
INIT_LIST_HEAD(&obj->uevent.event_list);
obj->uevent.uobject.user_handle = cmd.user_handle;
wq = pd->device->ops.create_wq(pd, &wq_init_attr, &attrs->driver_udata);
if (IS_ERR(wq)) {
@ -2972,12 +2982,12 @@ static int ib_uverbs_ex_create_wq(struct uverbs_attr_bundle *attrs)
wq->cq = cq;
wq->pd = pd;
wq->device = pd->device;
wq->wq_context = wq_init_attr.wq_context;
atomic_set(&wq->usecnt, 0);
atomic_inc(&pd->usecnt);
atomic_inc(&cq->usecnt);
wq->uobject = obj;
obj->uevent.uobject.object = wq;
obj->uevent.event_file = READ_ONCE(attrs->ufile->default_async_file);
if (obj->uevent.event_file)
uverbs_uobject_get(&obj->uevent.event_file->uobj);
memset(&resp, 0, sizeof(resp));
resp.wq_handle = obj->uevent.uobject.id;
@ -2996,6 +3006,8 @@ static int ib_uverbs_ex_create_wq(struct uverbs_attr_bundle *attrs)
return 0;
err_copy:
if (obj->uevent.event_file)
uverbs_uobject_put(&obj->uevent.event_file->uobj);
ib_destroy_wq(wq, uverbs_get_cleared_udata(attrs));
err_put_cq:
rdma_lookup_put_uobject(&cq->uobject->uevent.uobject,
@ -3441,46 +3453,25 @@ static int __uverbs_create_xsrq(struct uverbs_attr_bundle *attrs,
}
attr.event_handler = ib_uverbs_srq_event_handler;
attr.srq_context = attrs->ufile;
attr.srq_type = cmd->srq_type;
attr.attr.max_wr = cmd->max_wr;
attr.attr.max_sge = cmd->max_sge;
attr.attr.srq_limit = cmd->srq_limit;
INIT_LIST_HEAD(&obj->uevent.event_list);
obj->uevent.uobject.user_handle = cmd->user_handle;
srq = rdma_zalloc_drv_obj(ib_dev, ib_srq);
if (!srq) {
ret = -ENOMEM;
goto err_put;
srq = ib_create_srq_user(pd, &attr, obj, udata);
if (IS_ERR(srq)) {
ret = PTR_ERR(srq);
goto err_put_pd;
}
srq->device = pd->device;
srq->pd = pd;
srq->srq_type = cmd->srq_type;
srq->uobject = obj;
srq->event_handler = attr.event_handler;
srq->srq_context = attr.srq_context;
ret = pd->device->ops.create_srq(srq, &attr, udata);
if (ret)
goto err_free;
if (ib_srq_has_cq(cmd->srq_type)) {
srq->ext.cq = attr.ext.cq;
atomic_inc(&attr.ext.cq->usecnt);
}
if (cmd->srq_type == IB_SRQT_XRC) {
srq->ext.xrc.xrcd = attr.ext.xrc.xrcd;
atomic_inc(&attr.ext.xrc.xrcd->usecnt);
}
atomic_inc(&pd->usecnt);
atomic_set(&srq->usecnt, 0);
obj->uevent.uobject.object = srq;
obj->uevent.uobject.user_handle = cmd->user_handle;
obj->uevent.event_file = READ_ONCE(attrs->ufile->default_async_file);
if (obj->uevent.event_file)
uverbs_uobject_get(&obj->uevent.event_file->uobj);
memset(&resp, 0, sizeof resp);
resp.srq_handle = obj->uevent.uobject.id;
@ -3505,14 +3496,11 @@ static int __uverbs_create_xsrq(struct uverbs_attr_bundle *attrs,
return 0;
err_copy:
if (obj->uevent.event_file)
uverbs_uobject_put(&obj->uevent.event_file->uobj);
ib_destroy_srq_user(srq, uverbs_get_cleared_udata(attrs));
/* It was released in ib_destroy_srq_user */
srq = NULL;
err_free:
kfree(srq);
err_put:
err_put_pd:
uobj_put_obj_read(pd);
err_put_cq:
if (ib_srq_has_cq(cmd->srq_type))
rdma_lookup_put_uobject(&attr.ext.cq->uobject->uevent.uobject,
@ -3751,7 +3739,7 @@ static int ib_uverbs_ex_modify_cq(struct uverbs_attr_bundle *attrs)
#define UAPI_DEF_WRITE_IO(req, resp) \
.write.has_resp = 1 + \
BUILD_BUG_ON_ZERO(offsetof(req, response) != 0) + \
BUILD_BUG_ON_ZERO(sizeof(((req *)0)->response) != \
BUILD_BUG_ON_ZERO(sizeof_field(req, response) != \
sizeof(u64)), \
.write.req_size = sizeof(req), .write.resp_size = sizeof(resp)

View File

@ -58,6 +58,7 @@ struct bundle_priv {
DECLARE_BITMAP(uobj_finalize, UVERBS_API_ATTR_BKEY_LEN);
DECLARE_BITMAP(spec_finalize, UVERBS_API_ATTR_BKEY_LEN);
DECLARE_BITMAP(uobj_hw_obj_valid, UVERBS_API_ATTR_BKEY_LEN);
/*
* Must be last. bundle ends in a flex array which overlaps
@ -136,7 +137,7 @@ EXPORT_SYMBOL(_uverbs_alloc);
static bool uverbs_is_attr_cleared(const struct ib_uverbs_attr *uattr,
u16 len)
{
if (uattr->len > sizeof(((struct ib_uverbs_attr *)0)->data))
if (uattr->len > sizeof_field(struct ib_uverbs_attr, data))
return ib_is_buffer_cleared(u64_to_user_ptr(uattr->data) + len,
uattr->len - len);
@ -230,7 +231,8 @@ static void uverbs_free_idrs_array(const struct uverbs_api_attr *attr_uapi,
for (i = 0; i != attr->len; i++)
uverbs_finalize_object(attr->uobjects[i],
spec->u2.objs_arr.access, commit, attrs);
spec->u2.objs_arr.access, false, commit,
attrs);
}
static int uverbs_process_attr(struct bundle_priv *pbundle,
@ -502,7 +504,9 @@ static void bundle_destroy(struct bundle_priv *pbundle, bool commit)
uverbs_finalize_object(
attr->obj_attr.uobject,
attr->obj_attr.attr_elm->spec.u.obj.access, commit,
attr->obj_attr.attr_elm->spec.u.obj.access,
test_bit(i, pbundle->uobj_hw_obj_valid),
commit,
&pbundle->bundle);
}
@ -590,6 +594,8 @@ static int ib_uverbs_cmd_verbs(struct ib_uverbs_file *ufile,
sizeof(pbundle->bundle.attr_present));
memset(pbundle->uobj_finalize, 0, sizeof(pbundle->uobj_finalize));
memset(pbundle->spec_finalize, 0, sizeof(pbundle->spec_finalize));
memset(pbundle->uobj_hw_obj_valid, 0,
sizeof(pbundle->uobj_hw_obj_valid));
ret = ib_uverbs_run_method(pbundle, hdr->num_attrs);
bundle_destroy(pbundle, ret == 0);
@ -784,3 +790,15 @@ int uverbs_copy_to_struct_or_zero(const struct uverbs_attr_bundle *bundle,
}
return uverbs_copy_to(bundle, idx, from, size);
}
/* Once called an abort will call through to the type's destroy_hw() */
void uverbs_finalize_uobj_create(const struct uverbs_attr_bundle *bundle,
u16 idx)
{
struct bundle_priv *pbundle =
container_of(bundle, struct bundle_priv, bundle);
__set_bit(uapi_bkey_attr(uapi_key_attr(idx)),
pbundle->uobj_hw_obj_valid);
}
EXPORT_SYMBOL(uverbs_finalize_uobj_create);

View File

@ -75,7 +75,7 @@ static dev_t dynamic_uverbs_dev;
static struct class *uverbs_class;
static DEFINE_IDA(uverbs_ida);
static void ib_uverbs_add_one(struct ib_device *device);
static int ib_uverbs_add_one(struct ib_device *device);
static void ib_uverbs_remove_one(struct ib_device *device, void *client_data);
/*
@ -146,8 +146,7 @@ void ib_uverbs_release_ucq(struct ib_uverbs_completion_event_file *ev_file,
void ib_uverbs_release_uevent(struct ib_uevent_object *uobj)
{
struct ib_uverbs_async_event_file *async_file =
READ_ONCE(uobj->uobject.ufile->async_file);
struct ib_uverbs_async_event_file *async_file = uobj->event_file;
struct ib_uverbs_event *evt, *tmp;
if (!async_file)
@ -159,6 +158,7 @@ void ib_uverbs_release_uevent(struct ib_uevent_object *uobj)
kfree(evt);
}
spin_unlock_irq(&async_file->ev_queue.lock);
uverbs_uobject_put(&async_file->uobj);
}
void ib_uverbs_detach_umcast(struct ib_qp *qp,
@ -197,8 +197,8 @@ void ib_uverbs_release_file(struct kref *ref)
if (atomic_dec_and_test(&file->device->refcount))
ib_uverbs_comp_dev(file->device);
if (file->async_file)
uverbs_uobject_put(&file->async_file->uobj);
if (file->default_async_file)
uverbs_uobject_put(&file->default_async_file->uobj);
put_device(&file->device->dev);
if (file->disassociate_page)
@ -296,6 +296,8 @@ static __poll_t ib_uverbs_event_poll(struct ib_uverbs_event_queue *ev_queue,
spin_lock_irq(&ev_queue->lock);
if (!list_empty(&ev_queue->event_list))
pollflags = EPOLLIN | EPOLLRDNORM;
else if (ev_queue->is_closed)
pollflags = EPOLLERR;
spin_unlock_irq(&ev_queue->lock);
return pollflags;
@ -425,7 +427,7 @@ void ib_uverbs_async_handler(struct ib_uverbs_async_event_file *async_file,
static void uverbs_uobj_event(struct ib_uevent_object *eobj,
struct ib_event *event)
{
ib_uverbs_async_handler(READ_ONCE(eobj->uobject.ufile->async_file),
ib_uverbs_async_handler(eobj->event_file,
eobj->uobject.user_handle, event->event,
&eobj->event_list, &eobj->events_reported);
}
@ -482,10 +484,10 @@ void ib_uverbs_init_async_event_file(
/* The first async_event_file becomes the default one for the file. */
mutex_lock(&uverbs_file->ucontext_lock);
if (!uverbs_file->async_file) {
if (!uverbs_file->default_async_file) {
/* Pairs with the put in ib_uverbs_release_file */
uverbs_uobject_get(&async_file->uobj);
smp_store_release(&uverbs_file->async_file, async_file);
smp_store_release(&uverbs_file->default_async_file, async_file);
}
mutex_unlock(&uverbs_file->ucontext_lock);
@ -1092,7 +1094,7 @@ static int ib_uverbs_create_uapi(struct ib_device *device,
return 0;
}
static void ib_uverbs_add_one(struct ib_device *device)
static int ib_uverbs_add_one(struct ib_device *device)
{
int devnum;
dev_t base;
@ -1100,16 +1102,16 @@ static void ib_uverbs_add_one(struct ib_device *device)
int ret;
if (!device->ops.alloc_ucontext)
return;
return -EOPNOTSUPP;
uverbs_dev = kzalloc(sizeof(*uverbs_dev), GFP_KERNEL);
if (!uverbs_dev)
return;
return -ENOMEM;
ret = init_srcu_struct(&uverbs_dev->disassociate_srcu);
if (ret) {
kfree(uverbs_dev);
return;
return -ENOMEM;
}
device_initialize(&uverbs_dev->dev);
@ -1129,15 +1131,18 @@ static void ib_uverbs_add_one(struct ib_device *device)
devnum = ida_alloc_max(&uverbs_ida, IB_UVERBS_MAX_DEVICES - 1,
GFP_KERNEL);
if (devnum < 0)
if (devnum < 0) {
ret = -ENOMEM;
goto err;
}
uverbs_dev->devnum = devnum;
if (devnum >= IB_UVERBS_NUM_FIXED_MINOR)
base = dynamic_uverbs_dev + devnum - IB_UVERBS_NUM_FIXED_MINOR;
else
base = IB_UVERBS_BASE_DEV + devnum;
if (ib_uverbs_create_uapi(device, uverbs_dev))
ret = ib_uverbs_create_uapi(device, uverbs_dev);
if (ret)
goto err_uapi;
uverbs_dev->dev.devt = base;
@ -1152,7 +1157,7 @@ static void ib_uverbs_add_one(struct ib_device *device)
goto err_uapi;
ib_set_client_data(device, &uverbs_client, uverbs_dev);
return;
return 0;
err_uapi:
ida_free(&uverbs_ida, devnum);
@ -1161,7 +1166,7 @@ err:
ib_uverbs_comp_dev(uverbs_dev);
wait_for_completion(&uverbs_dev->comp);
put_device(&uverbs_dev->dev);
return;
return ret;
}
static void ib_uverbs_free_hw_resources(struct ib_uverbs_device *uverbs_dev,
@ -1201,9 +1206,6 @@ static void ib_uverbs_remove_one(struct ib_device *device, void *client_data)
struct ib_uverbs_device *uverbs_dev = client_data;
int wait_clients = 1;
if (!uverbs_dev)
return;
cdev_device_del(&uverbs_dev->cdev, &uverbs_dev->dev);
ida_free(&uverbs_ida, uverbs_dev->devnum);

View File

@ -75,40 +75,6 @@ static int uverbs_free_mw(struct ib_uobject *uobject,
return uverbs_dealloc_mw((struct ib_mw *)uobject->object);
}
static int uverbs_free_qp(struct ib_uobject *uobject,
enum rdma_remove_reason why,
struct uverbs_attr_bundle *attrs)
{
struct ib_qp *qp = uobject->object;
struct ib_uqp_object *uqp =
container_of(uobject, struct ib_uqp_object, uevent.uobject);
int ret;
/*
* If this is a user triggered destroy then do not allow destruction
* until the user cleans up all the mcast bindings. Unlike in other
* places we forcibly clean up the mcast attachments for !DESTROY
* because the mcast attaches are not ubojects and will not be
* destroyed by anything else during cleanup processing.
*/
if (why == RDMA_REMOVE_DESTROY) {
if (!list_empty(&uqp->mcast_list))
return -EBUSY;
} else if (qp == qp->real_qp) {
ib_uverbs_detach_umcast(qp, uqp);
}
ret = ib_destroy_qp_user(qp, &attrs->driver_udata);
if (ib_is_destroy_retryable(ret, why, uobject))
return ret;
if (uqp->uxrcd)
atomic_dec(&uqp->uxrcd->refcnt);
ib_uverbs_release_uevent(&uqp->uevent);
return ret;
}
static int uverbs_free_rwq_ind_tbl(struct ib_uobject *uobject,
enum rdma_remove_reason why,
struct uverbs_attr_bundle *attrs)
@ -125,48 +91,6 @@ static int uverbs_free_rwq_ind_tbl(struct ib_uobject *uobject,
return ret;
}
static int uverbs_free_wq(struct ib_uobject *uobject,
enum rdma_remove_reason why,
struct uverbs_attr_bundle *attrs)
{
struct ib_wq *wq = uobject->object;
struct ib_uwq_object *uwq =
container_of(uobject, struct ib_uwq_object, uevent.uobject);
int ret;
ret = ib_destroy_wq(wq, &attrs->driver_udata);
if (ib_is_destroy_retryable(ret, why, uobject))
return ret;
ib_uverbs_release_uevent(&uwq->uevent);
return ret;
}
static int uverbs_free_srq(struct ib_uobject *uobject,
enum rdma_remove_reason why,
struct uverbs_attr_bundle *attrs)
{
struct ib_srq *srq = uobject->object;
struct ib_uevent_object *uevent =
container_of(uobject, struct ib_uevent_object, uobject);
enum ib_srq_type srq_type = srq->srq_type;
int ret;
ret = ib_destroy_srq_user(srq, &attrs->driver_udata);
if (ib_is_destroy_retryable(ret, why, uobject))
return ret;
if (srq_type == IB_SRQT_XRC) {
struct ib_usrq_object *us =
container_of(uevent, struct ib_usrq_object, uevent);
atomic_dec(&us->uxrcd->refcnt);
}
ib_uverbs_release_uevent(uevent);
return ret;
}
static int uverbs_free_xrcd(struct ib_uobject *uobject,
enum rdma_remove_reason why,
struct uverbs_attr_bundle *attrs)
@ -252,10 +176,6 @@ DECLARE_UVERBS_NAMED_OBJECT(
"[infinibandevent]",
O_RDONLY));
DECLARE_UVERBS_NAMED_OBJECT(
UVERBS_OBJECT_QP,
UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uqp_object), uverbs_free_qp));
DECLARE_UVERBS_NAMED_METHOD_DESTROY(
UVERBS_METHOD_MW_DESTROY,
UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_MW_HANDLE,
@ -267,11 +187,6 @@ DECLARE_UVERBS_NAMED_OBJECT(UVERBS_OBJECT_MW,
UVERBS_TYPE_ALLOC_IDR(uverbs_free_mw),
&UVERBS_METHOD(UVERBS_METHOD_MW_DESTROY));
DECLARE_UVERBS_NAMED_OBJECT(
UVERBS_OBJECT_SRQ,
UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_usrq_object),
uverbs_free_srq));
DECLARE_UVERBS_NAMED_METHOD_DESTROY(
UVERBS_METHOD_AH_DESTROY,
UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_AH_HANDLE,
@ -296,10 +211,6 @@ DECLARE_UVERBS_NAMED_OBJECT(
uverbs_free_flow),
&UVERBS_METHOD(UVERBS_METHOD_FLOW_DESTROY));
DECLARE_UVERBS_NAMED_OBJECT(
UVERBS_OBJECT_WQ,
UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uwq_object), uverbs_free_wq));
DECLARE_UVERBS_NAMED_METHOD_DESTROY(
UVERBS_METHOD_RWQ_IND_TBL_DESTROY,
UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_RWQ_IND_TBL_HANDLE,
@ -340,18 +251,12 @@ const struct uapi_definition uverbs_def_obj_intf[] = {
UAPI_DEF_OBJ_NEEDS_FN(dealloc_pd)),
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_COMP_CHANNEL,
UAPI_DEF_OBJ_NEEDS_FN(dealloc_pd)),
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_QP,
UAPI_DEF_OBJ_NEEDS_FN(destroy_qp)),
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_AH,
UAPI_DEF_OBJ_NEEDS_FN(destroy_ah)),
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_MW,
UAPI_DEF_OBJ_NEEDS_FN(dealloc_mw)),
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_SRQ,
UAPI_DEF_OBJ_NEEDS_FN(destroy_srq)),
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_FLOW,
UAPI_DEF_OBJ_NEEDS_FN(destroy_flow)),
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_WQ,
UAPI_DEF_OBJ_NEEDS_FN(destroy_wq)),
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(
UVERBS_OBJECT_RWQ_IND_TBL,
UAPI_DEF_OBJ_NEEDS_FN(destroy_rwq_ind_table)),

View File

@ -100,6 +100,9 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
uverbs_uobject_get(ev_file_uobj);
}
obj->uevent.event_file = ib_uverbs_get_async_event(
attrs, UVERBS_ATTR_CREATE_CQ_EVENT_FD);
if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) {
ret = -EINVAL;
goto err_event_file;
@ -129,19 +132,17 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(
obj->uevent.uobject.object = cq;
obj->uevent.uobject.user_handle = user_handle;
rdma_restrack_uadd(&cq->res);
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_CREATE_CQ_HANDLE);
ret = uverbs_copy_to(attrs, UVERBS_ATTR_CREATE_CQ_RESP_CQE, &cq->cqe,
sizeof(cq->cqe));
if (ret)
goto err_cq;
return ret;
return 0;
err_cq:
ib_destroy_cq_user(cq, uverbs_get_cleared_udata(attrs));
cq = NULL;
err_free:
kfree(cq);
err_event_file:
if (obj->uevent.event_file)
uverbs_uobject_put(&obj->uevent.event_file->uobj);
if (ev_file)
uverbs_uobject_put(ev_file_uobj);
return ret;
@ -171,6 +172,10 @@ DECLARE_UVERBS_NAMED_METHOD(
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_CREATE_CQ_RESP_CQE,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY),
UVERBS_ATTR_FD(UVERBS_ATTR_CREATE_CQ_EVENT_FD,
UVERBS_OBJECT_ASYNC_EVENT,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_UHW());
static int UVERBS_HANDLER(UVERBS_METHOD_CQ_DESTROY)(

View File

@ -136,21 +136,15 @@ static int UVERBS_HANDLER(UVERBS_METHOD_DM_MR_REG)(
uobj->object = mr;
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_DM_MR_HANDLE);
ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DM_MR_RESP_LKEY, &mr->lkey,
sizeof(mr->lkey));
if (ret)
goto err_dereg;
return ret;
ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DM_MR_RESP_RKEY,
&mr->rkey, sizeof(mr->rkey));
if (ret)
goto err_dereg;
return 0;
err_dereg:
ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
return ret;
}

View File

@ -0,0 +1,401 @@
// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
/*
* Copyright (c) 2020, Mellanox Technologies inc. All rights reserved.
*/
#include <rdma/uverbs_std_types.h>
#include "rdma_core.h"
#include "uverbs.h"
#include "core_priv.h"
static int uverbs_free_qp(struct ib_uobject *uobject,
enum rdma_remove_reason why,
struct uverbs_attr_bundle *attrs)
{
struct ib_qp *qp = uobject->object;
struct ib_uqp_object *uqp =
container_of(uobject, struct ib_uqp_object, uevent.uobject);
int ret;
/*
* If this is a user triggered destroy then do not allow destruction
* until the user cleans up all the mcast bindings. Unlike in other
* places we forcibly clean up the mcast attachments for !DESTROY
* because the mcast attaches are not ubojects and will not be
* destroyed by anything else during cleanup processing.
*/
if (why == RDMA_REMOVE_DESTROY) {
if (!list_empty(&uqp->mcast_list))
return -EBUSY;
} else if (qp == qp->real_qp) {
ib_uverbs_detach_umcast(qp, uqp);
}
ret = ib_destroy_qp_user(qp, &attrs->driver_udata);
if (ib_is_destroy_retryable(ret, why, uobject))
return ret;
if (uqp->uxrcd)
atomic_dec(&uqp->uxrcd->refcnt);
ib_uverbs_release_uevent(&uqp->uevent);
return ret;
}
static int check_creation_flags(enum ib_qp_type qp_type,
u32 create_flags)
{
create_flags &= ~IB_UVERBS_QP_CREATE_SQ_SIG_ALL;
if (!create_flags || qp_type == IB_QPT_DRIVER)
return 0;
if (qp_type != IB_QPT_RAW_PACKET && qp_type != IB_QPT_UD)
return -EINVAL;
if ((create_flags & IB_UVERBS_QP_CREATE_SCATTER_FCS ||
create_flags & IB_UVERBS_QP_CREATE_CVLAN_STRIPPING) &&
qp_type != IB_QPT_RAW_PACKET)
return -EINVAL;
return 0;
}
static void set_caps(struct ib_qp_init_attr *attr,
struct ib_uverbs_qp_cap *cap, bool req)
{
if (req) {
attr->cap.max_send_wr = cap->max_send_wr;
attr->cap.max_recv_wr = cap->max_recv_wr;
attr->cap.max_send_sge = cap->max_send_sge;
attr->cap.max_recv_sge = cap->max_recv_sge;
attr->cap.max_inline_data = cap->max_inline_data;
} else {
cap->max_send_wr = attr->cap.max_send_wr;
cap->max_recv_wr = attr->cap.max_recv_wr;
cap->max_send_sge = attr->cap.max_send_sge;
cap->max_recv_sge = attr->cap.max_recv_sge;
cap->max_inline_data = attr->cap.max_inline_data;
}
}
static int UVERBS_HANDLER(UVERBS_METHOD_QP_CREATE)(
struct uverbs_attr_bundle *attrs)
{
struct ib_uqp_object *obj = container_of(
uverbs_attr_get_uobject(attrs, UVERBS_ATTR_CREATE_QP_HANDLE),
typeof(*obj), uevent.uobject);
struct ib_qp_init_attr attr = {};
struct ib_uverbs_qp_cap cap = {};
struct ib_rwq_ind_table *rwq_ind_tbl = NULL;
struct ib_qp *qp;
struct ib_pd *pd = NULL;
struct ib_srq *srq = NULL;
struct ib_cq *recv_cq = NULL;
struct ib_cq *send_cq = NULL;
struct ib_xrcd *xrcd = NULL;
struct ib_uobject *xrcd_uobj = NULL;
struct ib_device *device;
u64 user_handle;
int ret;
ret = uverbs_copy_from_or_zero(&cap, attrs,
UVERBS_ATTR_CREATE_QP_CAP);
if (!ret)
ret = uverbs_copy_from(&user_handle, attrs,
UVERBS_ATTR_CREATE_QP_USER_HANDLE);
if (!ret)
ret = uverbs_get_const(&attr.qp_type, attrs,
UVERBS_ATTR_CREATE_QP_TYPE);
if (ret)
return ret;
switch (attr.qp_type) {
case IB_QPT_XRC_TGT:
if (uverbs_attr_is_valid(attrs,
UVERBS_ATTR_CREATE_QP_RECV_CQ_HANDLE) ||
uverbs_attr_is_valid(attrs,
UVERBS_ATTR_CREATE_QP_SEND_CQ_HANDLE) ||
uverbs_attr_is_valid(attrs,
UVERBS_ATTR_CREATE_QP_PD_HANDLE) ||
uverbs_attr_is_valid(attrs,
UVERBS_ATTR_CREATE_QP_IND_TABLE_HANDLE))
return -EINVAL;
xrcd_uobj = uverbs_attr_get_uobject(attrs,
UVERBS_ATTR_CREATE_QP_XRCD_HANDLE);
if (IS_ERR(xrcd_uobj))
return PTR_ERR(xrcd_uobj);
xrcd = (struct ib_xrcd *)xrcd_uobj->object;
if (!xrcd)
return -EINVAL;
device = xrcd->device;
break;
case IB_UVERBS_QPT_RAW_PACKET:
if (!capable(CAP_NET_RAW))
return -EPERM;
fallthrough;
case IB_UVERBS_QPT_RC:
case IB_UVERBS_QPT_UC:
case IB_UVERBS_QPT_UD:
case IB_UVERBS_QPT_XRC_INI:
case IB_UVERBS_QPT_DRIVER:
if (uverbs_attr_is_valid(attrs,
UVERBS_ATTR_CREATE_QP_XRCD_HANDLE) ||
(uverbs_attr_is_valid(attrs,
UVERBS_ATTR_CREATE_QP_SRQ_HANDLE) &&
attr.qp_type == IB_QPT_XRC_INI))
return -EINVAL;
pd = uverbs_attr_get_obj(attrs,
UVERBS_ATTR_CREATE_QP_PD_HANDLE);
if (IS_ERR(pd))
return PTR_ERR(pd);
rwq_ind_tbl = uverbs_attr_get_obj(attrs,
UVERBS_ATTR_CREATE_QP_IND_TABLE_HANDLE);
if (!IS_ERR(rwq_ind_tbl)) {
if (cap.max_recv_wr || cap.max_recv_sge ||
uverbs_attr_is_valid(attrs,
UVERBS_ATTR_CREATE_QP_RECV_CQ_HANDLE) ||
uverbs_attr_is_valid(attrs,
UVERBS_ATTR_CREATE_QP_SRQ_HANDLE))
return -EINVAL;
/* send_cq is optinal */
if (cap.max_send_wr) {
send_cq = uverbs_attr_get_obj(attrs,
UVERBS_ATTR_CREATE_QP_SEND_CQ_HANDLE);
if (IS_ERR(send_cq))
return PTR_ERR(send_cq);
}
attr.rwq_ind_tbl = rwq_ind_tbl;
} else {
send_cq = uverbs_attr_get_obj(attrs,
UVERBS_ATTR_CREATE_QP_SEND_CQ_HANDLE);
if (IS_ERR(send_cq))
return PTR_ERR(send_cq);
if (attr.qp_type != IB_QPT_XRC_INI) {
recv_cq = uverbs_attr_get_obj(attrs,
UVERBS_ATTR_CREATE_QP_RECV_CQ_HANDLE);
if (IS_ERR(recv_cq))
return PTR_ERR(recv_cq);
}
}
device = pd->device;
break;
default:
return -EINVAL;
}
ret = uverbs_get_flags32(&attr.create_flags, attrs,
UVERBS_ATTR_CREATE_QP_FLAGS,
IB_UVERBS_QP_CREATE_BLOCK_MULTICAST_LOOPBACK |
IB_UVERBS_QP_CREATE_SCATTER_FCS |
IB_UVERBS_QP_CREATE_CVLAN_STRIPPING |
IB_UVERBS_QP_CREATE_PCI_WRITE_END_PADDING |
IB_UVERBS_QP_CREATE_SQ_SIG_ALL);
if (ret)
return ret;
ret = check_creation_flags(attr.qp_type, attr.create_flags);
if (ret)
return ret;
if (uverbs_attr_is_valid(attrs,
UVERBS_ATTR_CREATE_QP_SOURCE_QPN)) {
ret = uverbs_copy_from(&attr.source_qpn, attrs,
UVERBS_ATTR_CREATE_QP_SOURCE_QPN);
if (ret)
return ret;
attr.create_flags |= IB_QP_CREATE_SOURCE_QPN;
}
srq = uverbs_attr_get_obj(attrs,
UVERBS_ATTR_CREATE_QP_SRQ_HANDLE);
if (!IS_ERR(srq)) {
if ((srq->srq_type == IB_SRQT_XRC &&
attr.qp_type != IB_QPT_XRC_TGT) ||
(srq->srq_type != IB_SRQT_XRC &&
attr.qp_type == IB_QPT_XRC_TGT))
return -EINVAL;
attr.srq = srq;
}
obj->uevent.event_file = ib_uverbs_get_async_event(attrs,
UVERBS_ATTR_CREATE_QP_EVENT_FD);
INIT_LIST_HEAD(&obj->uevent.event_list);
INIT_LIST_HEAD(&obj->mcast_list);
obj->uevent.uobject.user_handle = user_handle;
attr.event_handler = ib_uverbs_qp_event_handler;
attr.send_cq = send_cq;
attr.recv_cq = recv_cq;
attr.xrcd = xrcd;
if (attr.create_flags & IB_UVERBS_QP_CREATE_SQ_SIG_ALL) {
/* This creation bit is uverbs one, need to mask before
* calling drivers. It was added to prevent an extra user attr
* only for that when using ioctl.
*/
attr.create_flags &= ~IB_UVERBS_QP_CREATE_SQ_SIG_ALL;
attr.sq_sig_type = IB_SIGNAL_ALL_WR;
} else {
attr.sq_sig_type = IB_SIGNAL_REQ_WR;
}
set_caps(&attr, &cap, true);
mutex_init(&obj->mcast_lock);
if (attr.qp_type == IB_QPT_XRC_TGT)
qp = ib_create_qp(pd, &attr);
else
qp = _ib_create_qp(device, pd, &attr, &attrs->driver_udata,
obj);
if (IS_ERR(qp)) {
ret = PTR_ERR(qp);
goto err_put;
}
if (attr.qp_type != IB_QPT_XRC_TGT) {
atomic_inc(&pd->usecnt);
if (attr.send_cq)
atomic_inc(&attr.send_cq->usecnt);
if (attr.recv_cq)
atomic_inc(&attr.recv_cq->usecnt);
if (attr.srq)
atomic_inc(&attr.srq->usecnt);
if (attr.rwq_ind_tbl)
atomic_inc(&attr.rwq_ind_tbl->usecnt);
} else {
obj->uxrcd = container_of(xrcd_uobj, struct ib_uxrcd_object,
uobject);
atomic_inc(&obj->uxrcd->refcnt);
/* It is done in _ib_create_qp for other QP types */
qp->uobject = obj;
}
obj->uevent.uobject.object = qp;
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_CREATE_QP_HANDLE);
if (attr.qp_type != IB_QPT_XRC_TGT) {
ret = ib_create_qp_security(qp, device);
if (ret)
return ret;
}
set_caps(&attr, &cap, false);
ret = uverbs_copy_to_struct_or_zero(attrs,
UVERBS_ATTR_CREATE_QP_RESP_CAP, &cap,
sizeof(cap));
if (ret)
return ret;
ret = uverbs_copy_to(attrs, UVERBS_ATTR_CREATE_QP_RESP_QP_NUM,
&qp->qp_num,
sizeof(qp->qp_num));
return ret;
err_put:
if (obj->uevent.event_file)
uverbs_uobject_put(&obj->uevent.event_file->uobj);
return ret;
};
DECLARE_UVERBS_NAMED_METHOD(
UVERBS_METHOD_QP_CREATE,
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_QP_HANDLE,
UVERBS_OBJECT_QP,
UVERBS_ACCESS_NEW,
UA_MANDATORY),
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_QP_XRCD_HANDLE,
UVERBS_OBJECT_XRCD,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_QP_PD_HANDLE,
UVERBS_OBJECT_PD,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_QP_SRQ_HANDLE,
UVERBS_OBJECT_SRQ,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_QP_SEND_CQ_HANDLE,
UVERBS_OBJECT_CQ,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_QP_RECV_CQ_HANDLE,
UVERBS_OBJECT_CQ,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_QP_IND_TABLE_HANDLE,
UVERBS_OBJECT_RWQ_IND_TBL,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_USER_HANDLE,
UVERBS_ATTR_TYPE(u64),
UA_MANDATORY),
UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_CAP,
UVERBS_ATTR_STRUCT(struct ib_uverbs_qp_cap,
max_inline_data),
UA_MANDATORY),
UVERBS_ATTR_CONST_IN(UVERBS_ATTR_CREATE_QP_TYPE,
enum ib_uverbs_qp_type,
UA_MANDATORY),
UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_CREATE_QP_FLAGS,
enum ib_uverbs_qp_create_flags,
UA_OPTIONAL),
UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_QP_SOURCE_QPN,
UVERBS_ATTR_TYPE(u32),
UA_OPTIONAL),
UVERBS_ATTR_FD(UVERBS_ATTR_CREATE_QP_EVENT_FD,
UVERBS_OBJECT_ASYNC_EVENT,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_CREATE_QP_RESP_CAP,
UVERBS_ATTR_STRUCT(struct ib_uverbs_qp_cap,
max_inline_data),
UA_MANDATORY),
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_CREATE_QP_RESP_QP_NUM,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY),
UVERBS_ATTR_UHW());
static int UVERBS_HANDLER(UVERBS_METHOD_QP_DESTROY)(
struct uverbs_attr_bundle *attrs)
{
struct ib_uobject *uobj =
uverbs_attr_get_uobject(attrs, UVERBS_ATTR_DESTROY_QP_HANDLE);
struct ib_uqp_object *obj =
container_of(uobj, struct ib_uqp_object, uevent.uobject);
struct ib_uverbs_destroy_qp_resp resp = {
.events_reported = obj->uevent.events_reported
};
return uverbs_copy_to(attrs, UVERBS_ATTR_DESTROY_QP_RESP, &resp,
sizeof(resp));
}
DECLARE_UVERBS_NAMED_METHOD(
UVERBS_METHOD_QP_DESTROY,
UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_QP_HANDLE,
UVERBS_OBJECT_QP,
UVERBS_ACCESS_DESTROY,
UA_MANDATORY),
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_DESTROY_QP_RESP,
UVERBS_ATTR_TYPE(struct ib_uverbs_destroy_qp_resp),
UA_MANDATORY));
DECLARE_UVERBS_NAMED_OBJECT(
UVERBS_OBJECT_QP,
UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uqp_object), uverbs_free_qp),
&UVERBS_METHOD(UVERBS_METHOD_QP_CREATE),
&UVERBS_METHOD(UVERBS_METHOD_QP_DESTROY));
const struct uapi_definition uverbs_def_obj_qp[] = {
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_QP,
UAPI_DEF_OBJ_NEEDS_FN(destroy_qp)),
{}
};

View File

@ -0,0 +1,234 @@
// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
/*
* Copyright (c) 2020, Mellanox Technologies inc. All rights reserved.
*/
#include <rdma/uverbs_std_types.h>
#include "rdma_core.h"
#include "uverbs.h"
static int uverbs_free_srq(struct ib_uobject *uobject,
enum rdma_remove_reason why,
struct uverbs_attr_bundle *attrs)
{
struct ib_srq *srq = uobject->object;
struct ib_uevent_object *uevent =
container_of(uobject, struct ib_uevent_object, uobject);
enum ib_srq_type srq_type = srq->srq_type;
int ret;
ret = ib_destroy_srq_user(srq, &attrs->driver_udata);
if (ib_is_destroy_retryable(ret, why, uobject))
return ret;
if (srq_type == IB_SRQT_XRC) {
struct ib_usrq_object *us =
container_of(uobject, struct ib_usrq_object,
uevent.uobject);
atomic_dec(&us->uxrcd->refcnt);
}
ib_uverbs_release_uevent(uevent);
return ret;
}
static int UVERBS_HANDLER(UVERBS_METHOD_SRQ_CREATE)(
struct uverbs_attr_bundle *attrs)
{
struct ib_usrq_object *obj = container_of(
uverbs_attr_get_uobject(attrs, UVERBS_ATTR_CREATE_SRQ_HANDLE),
typeof(*obj), uevent.uobject);
struct ib_pd *pd =
uverbs_attr_get_obj(attrs, UVERBS_ATTR_CREATE_SRQ_PD_HANDLE);
struct ib_srq_init_attr attr = {};
struct ib_uobject *xrcd_uobj;
struct ib_srq *srq;
u64 user_handle;
int ret;
ret = uverbs_copy_from(&attr.attr.max_sge, attrs,
UVERBS_ATTR_CREATE_SRQ_MAX_SGE);
if (!ret)
ret = uverbs_copy_from(&attr.attr.max_wr, attrs,
UVERBS_ATTR_CREATE_SRQ_MAX_WR);
if (!ret)
ret = uverbs_copy_from(&attr.attr.srq_limit, attrs,
UVERBS_ATTR_CREATE_SRQ_LIMIT);
if (!ret)
ret = uverbs_copy_from(&user_handle, attrs,
UVERBS_ATTR_CREATE_SRQ_USER_HANDLE);
if (!ret)
ret = uverbs_get_const(&attr.srq_type, attrs,
UVERBS_ATTR_CREATE_SRQ_TYPE);
if (ret)
return ret;
if (ib_srq_has_cq(attr.srq_type)) {
attr.ext.cq = uverbs_attr_get_obj(attrs,
UVERBS_ATTR_CREATE_SRQ_CQ_HANDLE);
if (IS_ERR(attr.ext.cq))
return PTR_ERR(attr.ext.cq);
}
switch (attr.srq_type) {
case IB_UVERBS_SRQT_XRC:
xrcd_uobj = uverbs_attr_get_uobject(attrs,
UVERBS_ATTR_CREATE_SRQ_XRCD_HANDLE);
if (IS_ERR(xrcd_uobj))
return PTR_ERR(xrcd_uobj);
attr.ext.xrc.xrcd = (struct ib_xrcd *)xrcd_uobj->object;
if (!attr.ext.xrc.xrcd)
return -EINVAL;
obj->uxrcd = container_of(xrcd_uobj, struct ib_uxrcd_object,
uobject);
atomic_inc(&obj->uxrcd->refcnt);
break;
case IB_UVERBS_SRQT_TM:
ret = uverbs_copy_from(&attr.ext.tag_matching.max_num_tags,
attrs,
UVERBS_ATTR_CREATE_SRQ_MAX_NUM_TAGS);
if (ret)
return ret;
break;
case IB_UVERBS_SRQT_BASIC:
break;
default:
return -EINVAL;
}
obj->uevent.event_file = ib_uverbs_get_async_event(attrs,
UVERBS_ATTR_CREATE_SRQ_EVENT_FD);
INIT_LIST_HEAD(&obj->uevent.event_list);
attr.event_handler = ib_uverbs_srq_event_handler;
obj->uevent.uobject.user_handle = user_handle;
srq = ib_create_srq_user(pd, &attr, obj, &attrs->driver_udata);
if (IS_ERR(srq)) {
ret = PTR_ERR(srq);
goto err;
}
obj->uevent.uobject.object = srq;
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_CREATE_SRQ_HANDLE);
ret = uverbs_copy_to(attrs, UVERBS_ATTR_CREATE_SRQ_RESP_MAX_WR,
&attr.attr.max_wr,
sizeof(attr.attr.max_wr));
if (ret)
return ret;
ret = uverbs_copy_to(attrs, UVERBS_ATTR_CREATE_SRQ_RESP_MAX_SGE,
&attr.attr.max_sge,
sizeof(attr.attr.max_sge));
if (ret)
return ret;
if (attr.srq_type == IB_SRQT_XRC) {
ret = uverbs_copy_to(attrs,
UVERBS_ATTR_CREATE_SRQ_RESP_SRQ_NUM,
&srq->ext.xrc.srq_num,
sizeof(srq->ext.xrc.srq_num));
if (ret)
return ret;
}
return 0;
err:
if (obj->uevent.event_file)
uverbs_uobject_put(&obj->uevent.event_file->uobj);
if (attr.srq_type == IB_SRQT_XRC)
atomic_dec(&obj->uxrcd->refcnt);
return ret;
};
DECLARE_UVERBS_NAMED_METHOD(
UVERBS_METHOD_SRQ_CREATE,
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_SRQ_HANDLE,
UVERBS_OBJECT_SRQ,
UVERBS_ACCESS_NEW,
UA_MANDATORY),
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_SRQ_PD_HANDLE,
UVERBS_OBJECT_PD,
UVERBS_ACCESS_READ,
UA_MANDATORY),
UVERBS_ATTR_CONST_IN(UVERBS_ATTR_CREATE_SRQ_TYPE,
enum ib_uverbs_srq_type,
UA_MANDATORY),
UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_SRQ_USER_HANDLE,
UVERBS_ATTR_TYPE(u64),
UA_MANDATORY),
UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_SRQ_MAX_WR,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY),
UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_SRQ_MAX_SGE,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY),
UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_SRQ_LIMIT,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY),
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_SRQ_XRCD_HANDLE,
UVERBS_OBJECT_XRCD,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_SRQ_CQ_HANDLE,
UVERBS_OBJECT_CQ,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_SRQ_MAX_NUM_TAGS,
UVERBS_ATTR_TYPE(u32),
UA_OPTIONAL),
UVERBS_ATTR_FD(UVERBS_ATTR_CREATE_SRQ_EVENT_FD,
UVERBS_OBJECT_ASYNC_EVENT,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_CREATE_SRQ_RESP_MAX_WR,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY),
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_CREATE_SRQ_RESP_MAX_SGE,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY),
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_CREATE_SRQ_RESP_SRQ_NUM,
UVERBS_ATTR_TYPE(u32),
UA_OPTIONAL),
UVERBS_ATTR_UHW());
static int UVERBS_HANDLER(UVERBS_METHOD_SRQ_DESTROY)(
struct uverbs_attr_bundle *attrs)
{
struct ib_uobject *uobj =
uverbs_attr_get_uobject(attrs, UVERBS_ATTR_DESTROY_SRQ_HANDLE);
struct ib_usrq_object *obj =
container_of(uobj, struct ib_usrq_object, uevent.uobject);
struct ib_uverbs_destroy_srq_resp resp = {
.events_reported = obj->uevent.events_reported
};
return uverbs_copy_to(attrs, UVERBS_ATTR_DESTROY_SRQ_RESP, &resp,
sizeof(resp));
}
DECLARE_UVERBS_NAMED_METHOD(
UVERBS_METHOD_SRQ_DESTROY,
UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_SRQ_HANDLE,
UVERBS_OBJECT_SRQ,
UVERBS_ACCESS_DESTROY,
UA_MANDATORY),
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_DESTROY_SRQ_RESP,
UVERBS_ATTR_TYPE(struct ib_uverbs_destroy_srq_resp),
UA_MANDATORY));
DECLARE_UVERBS_NAMED_OBJECT(
UVERBS_OBJECT_SRQ,
UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_usrq_object),
uverbs_free_srq),
&UVERBS_METHOD(UVERBS_METHOD_SRQ_CREATE),
&UVERBS_METHOD(UVERBS_METHOD_SRQ_DESTROY)
);
const struct uapi_definition uverbs_def_obj_srq[] = {
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_SRQ,
UAPI_DEF_OBJ_NEEDS_FN(destroy_srq)),
{}
};

View File

@ -0,0 +1,194 @@
// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
/*
* Copyright (c) 2020, Mellanox Technologies inc. All rights reserved.
*/
#include <rdma/uverbs_std_types.h>
#include "rdma_core.h"
#include "uverbs.h"
static int uverbs_free_wq(struct ib_uobject *uobject,
enum rdma_remove_reason why,
struct uverbs_attr_bundle *attrs)
{
struct ib_wq *wq = uobject->object;
struct ib_uwq_object *uwq =
container_of(uobject, struct ib_uwq_object, uevent.uobject);
int ret;
ret = ib_destroy_wq(wq, &attrs->driver_udata);
if (ib_is_destroy_retryable(ret, why, uobject))
return ret;
ib_uverbs_release_uevent(&uwq->uevent);
return ret;
}
static int UVERBS_HANDLER(UVERBS_METHOD_WQ_CREATE)(
struct uverbs_attr_bundle *attrs)
{
struct ib_uwq_object *obj = container_of(
uverbs_attr_get_uobject(attrs, UVERBS_ATTR_CREATE_WQ_HANDLE),
typeof(*obj), uevent.uobject);
struct ib_pd *pd =
uverbs_attr_get_obj(attrs, UVERBS_ATTR_CREATE_WQ_PD_HANDLE);
struct ib_cq *cq =
uverbs_attr_get_obj(attrs, UVERBS_ATTR_CREATE_WQ_CQ_HANDLE);
struct ib_wq_init_attr wq_init_attr = {};
struct ib_wq *wq;
u64 user_handle;
int ret;
ret = uverbs_get_flags32(&wq_init_attr.create_flags, attrs,
UVERBS_ATTR_CREATE_WQ_FLAGS,
IB_UVERBS_WQ_FLAGS_CVLAN_STRIPPING |
IB_UVERBS_WQ_FLAGS_SCATTER_FCS |
IB_UVERBS_WQ_FLAGS_DELAY_DROP |
IB_UVERBS_WQ_FLAGS_PCI_WRITE_END_PADDING);
if (!ret)
ret = uverbs_copy_from(&wq_init_attr.max_sge, attrs,
UVERBS_ATTR_CREATE_WQ_MAX_SGE);
if (!ret)
ret = uverbs_copy_from(&wq_init_attr.max_wr, attrs,
UVERBS_ATTR_CREATE_WQ_MAX_WR);
if (!ret)
ret = uverbs_copy_from(&user_handle, attrs,
UVERBS_ATTR_CREATE_WQ_USER_HANDLE);
if (!ret)
ret = uverbs_get_const(&wq_init_attr.wq_type, attrs,
UVERBS_ATTR_CREATE_WQ_TYPE);
if (ret)
return ret;
if (wq_init_attr.wq_type != IB_WQT_RQ)
return -EINVAL;
obj->uevent.event_file = ib_uverbs_get_async_event(attrs,
UVERBS_ATTR_CREATE_WQ_EVENT_FD);
obj->uevent.uobject.user_handle = user_handle;
INIT_LIST_HEAD(&obj->uevent.event_list);
wq_init_attr.event_handler = ib_uverbs_wq_event_handler;
wq_init_attr.wq_context = attrs->ufile;
wq_init_attr.cq = cq;
wq = pd->device->ops.create_wq(pd, &wq_init_attr, &attrs->driver_udata);
if (IS_ERR(wq)) {
ret = PTR_ERR(wq);
goto err;
}
obj->uevent.uobject.object = wq;
wq->wq_type = wq_init_attr.wq_type;
wq->cq = cq;
wq->pd = pd;
wq->device = pd->device;
wq->wq_context = wq_init_attr.wq_context;
atomic_set(&wq->usecnt, 0);
atomic_inc(&pd->usecnt);
atomic_inc(&cq->usecnt);
wq->uobject = obj;
uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_CREATE_WQ_HANDLE);
ret = uverbs_copy_to(attrs, UVERBS_ATTR_CREATE_WQ_RESP_MAX_WR,
&wq_init_attr.max_wr,
sizeof(wq_init_attr.max_wr));
if (ret)
return ret;
ret = uverbs_copy_to(attrs, UVERBS_ATTR_CREATE_WQ_RESP_MAX_SGE,
&wq_init_attr.max_sge,
sizeof(wq_init_attr.max_sge));
if (ret)
return ret;
ret = uverbs_copy_to(attrs, UVERBS_ATTR_CREATE_WQ_RESP_WQ_NUM,
&wq->wq_num,
sizeof(wq->wq_num));
return ret;
err:
if (obj->uevent.event_file)
uverbs_uobject_put(&obj->uevent.event_file->uobj);
return ret;
};
DECLARE_UVERBS_NAMED_METHOD(
UVERBS_METHOD_WQ_CREATE,
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_WQ_HANDLE,
UVERBS_OBJECT_WQ,
UVERBS_ACCESS_NEW,
UA_MANDATORY),
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_WQ_PD_HANDLE,
UVERBS_OBJECT_PD,
UVERBS_ACCESS_READ,
UA_MANDATORY),
UVERBS_ATTR_CONST_IN(UVERBS_ATTR_CREATE_WQ_TYPE,
enum ib_wq_type,
UA_MANDATORY),
UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_WQ_USER_HANDLE,
UVERBS_ATTR_TYPE(u64),
UA_MANDATORY),
UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_WQ_MAX_WR,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY),
UVERBS_ATTR_PTR_IN(UVERBS_ATTR_CREATE_WQ_MAX_SGE,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY),
UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_CREATE_WQ_FLAGS,
enum ib_uverbs_wq_flags,
UA_MANDATORY),
UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_WQ_CQ_HANDLE,
UVERBS_OBJECT_CQ,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_FD(UVERBS_ATTR_CREATE_WQ_EVENT_FD,
UVERBS_OBJECT_ASYNC_EVENT,
UVERBS_ACCESS_READ,
UA_OPTIONAL),
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_CREATE_WQ_RESP_MAX_WR,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY),
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_CREATE_WQ_RESP_MAX_SGE,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY),
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_CREATE_WQ_RESP_WQ_NUM,
UVERBS_ATTR_TYPE(u32),
UA_OPTIONAL),
UVERBS_ATTR_UHW());
static int UVERBS_HANDLER(UVERBS_METHOD_WQ_DESTROY)(
struct uverbs_attr_bundle *attrs)
{
struct ib_uobject *uobj =
uverbs_attr_get_uobject(attrs, UVERBS_ATTR_DESTROY_WQ_HANDLE);
struct ib_uwq_object *obj =
container_of(uobj, struct ib_uwq_object, uevent.uobject);
return uverbs_copy_to(attrs, UVERBS_ATTR_DESTROY_WQ_RESP,
&obj->uevent.events_reported,
sizeof(obj->uevent.events_reported));
}
DECLARE_UVERBS_NAMED_METHOD(
UVERBS_METHOD_WQ_DESTROY,
UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_WQ_HANDLE,
UVERBS_OBJECT_WQ,
UVERBS_ACCESS_DESTROY,
UA_MANDATORY),
UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_DESTROY_WQ_RESP,
UVERBS_ATTR_TYPE(u32),
UA_MANDATORY));
DECLARE_UVERBS_NAMED_OBJECT(
UVERBS_OBJECT_WQ,
UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uwq_object), uverbs_free_wq),
&UVERBS_METHOD(UVERBS_METHOD_WQ_CREATE),
&UVERBS_METHOD(UVERBS_METHOD_WQ_DESTROY)
);
const struct uapi_definition uverbs_def_obj_wq[] = {
UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_WQ,
UAPI_DEF_OBJ_NEEDS_FN(destroy_wq)),
{}
};

View File

@ -634,6 +634,9 @@ static const struct uapi_definition uverbs_core_api[] = {
UAPI_DEF_CHAIN(uverbs_def_obj_flow_action),
UAPI_DEF_CHAIN(uverbs_def_obj_intf),
UAPI_DEF_CHAIN(uverbs_def_obj_mr),
UAPI_DEF_CHAIN(uverbs_def_obj_qp),
UAPI_DEF_CHAIN(uverbs_def_obj_srq),
UAPI_DEF_CHAIN(uverbs_def_obj_wq),
UAPI_DEF_CHAIN(uverbs_def_write_intf),
{},
};

View File

@ -50,6 +50,7 @@
#include <rdma/ib_cache.h>
#include <rdma/ib_addr.h>
#include <rdma/rw.h>
#include <rdma/lag.h>
#include "core_priv.h"
#include <trace/events/rdma_core.h>
@ -500,8 +501,10 @@ rdma_update_sgid_attr(struct rdma_ah_attr *ah_attr,
static struct ib_ah *_rdma_create_ah(struct ib_pd *pd,
struct rdma_ah_attr *ah_attr,
u32 flags,
struct ib_udata *udata)
struct ib_udata *udata,
struct net_device *xmit_slave)
{
struct rdma_ah_init_attr init_attr = {};
struct ib_device *device = pd->device;
struct ib_ah *ah;
int ret;
@ -521,8 +524,11 @@ static struct ib_ah *_rdma_create_ah(struct ib_pd *pd,
ah->pd = pd;
ah->type = ah_attr->type;
ah->sgid_attr = rdma_update_sgid_attr(ah_attr, NULL);
init_attr.ah_attr = ah_attr;
init_attr.flags = flags;
init_attr.xmit_slave = xmit_slave;
ret = device->ops.create_ah(ah, ah_attr, flags, udata);
ret = device->ops.create_ah(ah, &init_attr, udata);
if (ret) {
kfree(ah);
return ERR_PTR(ret);
@ -547,15 +553,22 @@ struct ib_ah *rdma_create_ah(struct ib_pd *pd, struct rdma_ah_attr *ah_attr,
u32 flags)
{
const struct ib_gid_attr *old_sgid_attr;
struct net_device *slave;
struct ib_ah *ah;
int ret;
ret = rdma_fill_sgid_attr(pd->device, ah_attr, &old_sgid_attr);
if (ret)
return ERR_PTR(ret);
ah = _rdma_create_ah(pd, ah_attr, flags, NULL);
slave = rdma_lag_get_ah_roce_slave(pd->device, ah_attr,
(flags & RDMA_CREATE_AH_SLEEPABLE) ?
GFP_KERNEL : GFP_ATOMIC);
if (IS_ERR(slave)) {
rdma_unfill_sgid_attr(ah_attr, old_sgid_attr);
return (void *)slave;
}
ah = _rdma_create_ah(pd, ah_attr, flags, NULL, slave);
rdma_lag_put_ah_roce_slave(slave);
rdma_unfill_sgid_attr(ah_attr, old_sgid_attr);
return ah;
}
@ -594,7 +607,8 @@ struct ib_ah *rdma_create_user_ah(struct ib_pd *pd,
}
}
ah = _rdma_create_ah(pd, ah_attr, RDMA_CREATE_AH_SLEEPABLE, udata);
ah = _rdma_create_ah(pd, ah_attr, RDMA_CREATE_AH_SLEEPABLE,
udata, NULL);
out:
rdma_unfill_sgid_attr(ah_attr, old_sgid_attr);
@ -967,15 +981,29 @@ EXPORT_SYMBOL(rdma_destroy_ah_user);
/* Shared receive queues */
struct ib_srq *ib_create_srq(struct ib_pd *pd,
struct ib_srq_init_attr *srq_init_attr)
/**
* ib_create_srq_user - Creates a SRQ associated with the specified protection
* domain.
* @pd: The protection domain associated with the SRQ.
* @srq_init_attr: A list of initial attributes required to create the
* SRQ. If SRQ creation succeeds, then the attributes are updated to
* the actual capabilities of the created SRQ.
* @uobject - uobject pointer if this is not a kernel SRQ
* @udata - udata pointer if this is not a kernel SRQ
*
* srq_attr->max_wr and srq_attr->max_sge are read the determine the
* requested size of the SRQ, and set to the actual values allocated
* on return. If ib_create_srq() succeeds, then max_wr and max_sge
* will always be at least as large as the requested values.
*/
struct ib_srq *ib_create_srq_user(struct ib_pd *pd,
struct ib_srq_init_attr *srq_init_attr,
struct ib_usrq_object *uobject,
struct ib_udata *udata)
{
struct ib_srq *srq;
int ret;
if (!pd->device->ops.create_srq)
return ERR_PTR(-EOPNOTSUPP);
srq = rdma_zalloc_drv_obj(pd->device, ib_srq);
if (!srq)
return ERR_PTR(-ENOMEM);
@ -985,6 +1013,7 @@ struct ib_srq *ib_create_srq(struct ib_pd *pd,
srq->event_handler = srq_init_attr->event_handler;
srq->srq_context = srq_init_attr->srq_context;
srq->srq_type = srq_init_attr->srq_type;
srq->uobject = uobject;
if (ib_srq_has_cq(srq->srq_type)) {
srq->ext.cq = srq_init_attr->ext.cq;
@ -996,7 +1025,7 @@ struct ib_srq *ib_create_srq(struct ib_pd *pd,
}
atomic_inc(&pd->usecnt);
ret = pd->device->ops.create_srq(srq, srq_init_attr, NULL);
ret = pd->device->ops.create_srq(srq, srq_init_attr, udata);
if (ret) {
atomic_dec(&srq->pd->usecnt);
if (srq->srq_type == IB_SRQT_XRC)
@ -1009,7 +1038,7 @@ struct ib_srq *ib_create_srq(struct ib_pd *pd,
return srq;
}
EXPORT_SYMBOL(ib_create_srq);
EXPORT_SYMBOL(ib_create_srq_user);
int ib_modify_srq(struct ib_srq *srq,
struct ib_srq_attr *srq_attr,
@ -1633,11 +1662,35 @@ static int _ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *attr,
const struct ib_gid_attr *old_sgid_attr_alt_av;
int ret;
attr->xmit_slave = NULL;
if (attr_mask & IB_QP_AV) {
ret = rdma_fill_sgid_attr(qp->device, &attr->ah_attr,
&old_sgid_attr_av);
if (ret)
return ret;
if (attr->ah_attr.type == RDMA_AH_ATTR_TYPE_ROCE &&
is_qp_type_connected(qp)) {
struct net_device *slave;
/*
* If the user provided the qp_attr then we have to
* resolve it. Kerne users have to provide already
* resolved rdma_ah_attr's.
*/
if (udata) {
ret = ib_resolve_eth_dmac(qp->device,
&attr->ah_attr);
if (ret)
goto out_av;
}
slave = rdma_lag_get_ah_roce_slave(qp->device,
&attr->ah_attr,
GFP_KERNEL);
if (IS_ERR(slave))
goto out_av;
attr->xmit_slave = slave;
}
}
if (attr_mask & IB_QP_ALT_PATH) {
/*
@ -1664,18 +1717,6 @@ static int _ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *attr,
}
}
/*
* If the user provided the qp_attr then we have to resolve it. Kernel
* users have to provide already resolved rdma_ah_attr's
*/
if (udata && (attr_mask & IB_QP_AV) &&
attr->ah_attr.type == RDMA_AH_ATTR_TYPE_ROCE &&
is_qp_type_connected(qp)) {
ret = ib_resolve_eth_dmac(qp->device, &attr->ah_attr);
if (ret)
goto out;
}
if (rdma_ib_or_roce(qp->device, port)) {
if (attr_mask & IB_QP_RQ_PSN && attr->rq_psn & ~0xffffff) {
dev_warn(&qp->device->dev,
@ -1717,8 +1758,10 @@ out:
if (attr_mask & IB_QP_ALT_PATH)
rdma_unfill_sgid_attr(&attr->alt_ah_attr, old_sgid_attr_alt_av);
out_av:
if (attr_mask & IB_QP_AV)
if (attr_mask & IB_QP_AV) {
rdma_lag_put_ah_roce_slave(attr->xmit_slave);
rdma_unfill_sgid_attr(&attr->ah_attr, old_sgid_attr_av);
}
return ret;
}
@ -1962,6 +2005,9 @@ EXPORT_SYMBOL(__ib_create_cq);
int rdma_set_cq_moderation(struct ib_cq *cq, u16 cq_count, u16 cq_period)
{
if (cq->shared)
return -EOPNOTSUPP;
return cq->device->ops.modify_cq ?
cq->device->ops.modify_cq(cq, cq_count,
cq_period) : -EOPNOTSUPP;
@ -1970,6 +2016,9 @@ EXPORT_SYMBOL(rdma_set_cq_moderation);
int ib_destroy_cq_user(struct ib_cq *cq, struct ib_udata *udata)
{
if (WARN_ON_ONCE(cq->shared))
return -EOPNOTSUPP;
if (atomic_read(&cq->usecnt))
return -EBUSY;
@ -1982,6 +2031,9 @@ EXPORT_SYMBOL(ib_destroy_cq_user);
int ib_resize_cq(struct ib_cq *cq, int cqe)
{
if (cq->shared)
return -EOPNOTSUPP;
return cq->device->ops.resize_cq ?
cq->device->ops.resize_cq(cq, cqe, NULL) : -EOPNOTSUPP;
}
@ -2160,54 +2212,6 @@ out:
}
EXPORT_SYMBOL(ib_alloc_mr_integrity);
/* "Fast" memory regions */
struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd,
int mr_access_flags,
struct ib_fmr_attr *fmr_attr)
{
struct ib_fmr *fmr;
if (!pd->device->ops.alloc_fmr)
return ERR_PTR(-EOPNOTSUPP);
fmr = pd->device->ops.alloc_fmr(pd, mr_access_flags, fmr_attr);
if (!IS_ERR(fmr)) {
fmr->device = pd->device;
fmr->pd = pd;
atomic_inc(&pd->usecnt);
}
return fmr;
}
EXPORT_SYMBOL(ib_alloc_fmr);
int ib_unmap_fmr(struct list_head *fmr_list)
{
struct ib_fmr *fmr;
if (list_empty(fmr_list))
return 0;
fmr = list_entry(fmr_list->next, struct ib_fmr, list);
return fmr->device->ops.unmap_fmr(fmr_list);
}
EXPORT_SYMBOL(ib_unmap_fmr);
int ib_dealloc_fmr(struct ib_fmr *fmr)
{
struct ib_pd *pd;
int ret;
pd = fmr->pd;
ret = fmr->device->ops.dealloc_fmr(fmr);
if (!ret)
atomic_dec(&pd->usecnt);
return ret;
}
EXPORT_SYMBOL(ib_dealloc_fmr);
/* Multicast groups */
static bool is_valid_mcast_lid(struct ib_qp *qp, u16 lid)
@ -2574,6 +2578,7 @@ EXPORT_SYMBOL(ib_map_mr_sg_pi);
* @page_size: page vector desired page size
*
* Constraints:
*
* - The first sg element is allowed to have an offset.
* - Each sg element must either be aligned to page_size or virtually
* contiguous to the previous element. In case an sg element has a
@ -2607,10 +2612,12 @@ EXPORT_SYMBOL(ib_map_mr_sg);
* @mr: memory region
* @sgl: dma mapped scatterlist
* @sg_nents: number of entries in sg
* @sg_offset_p: IN: start offset in bytes into sg
* OUT: offset in bytes for element n of the sg of the first
* @sg_offset_p: ==== =======================================================
* IN start offset in bytes into sg
* OUT offset in bytes for element n of the sg of the first
* byte that has not been processed where n is the return
* value of this function.
* ==== =======================================================
* @set_page: driver page assignment function pointer
*
* Core service helper for drivers to convert the largest

View File

@ -177,9 +177,6 @@ int bnxt_re_query_device(struct ib_device *ibdev,
ib_attr->max_total_mcast_qp_attach = 0;
ib_attr->max_ah = dev_attr->max_ah;
ib_attr->max_fmr = 0;
ib_attr->max_map_per_fmr = 0;
ib_attr->max_srq = dev_attr->max_srq;
ib_attr->max_srq_wr = dev_attr->max_srq_wqes;
ib_attr->max_srq_sge = dev_attr->max_srq_sges;
@ -631,11 +628,12 @@ static u8 bnxt_re_stack_to_dev_nw_type(enum rdma_network_type ntype)
return nw_type;
}
int bnxt_re_create_ah(struct ib_ah *ib_ah, struct rdma_ah_attr *ah_attr,
u32 flags, struct ib_udata *udata)
int bnxt_re_create_ah(struct ib_ah *ib_ah, struct rdma_ah_init_attr *init_attr,
struct ib_udata *udata)
{
struct ib_pd *ib_pd = ib_ah->pd;
struct bnxt_re_pd *pd = container_of(ib_pd, struct bnxt_re_pd, ib_pd);
struct rdma_ah_attr *ah_attr = init_attr->ah_attr;
const struct ib_global_route *grh = rdma_ah_read_grh(ah_attr);
struct bnxt_re_dev *rdev = pd->rdev;
const struct ib_gid_attr *sgid_attr;
@ -673,7 +671,8 @@ int bnxt_re_create_ah(struct ib_ah *ib_ah, struct rdma_ah_attr *ah_attr,
memcpy(ah->qplib_ah.dmac, ah_attr->roce.dmac, ETH_ALEN);
rc = bnxt_qplib_create_ah(&rdev->qplib_res, &ah->qplib_ah,
!(flags & RDMA_CREATE_AH_SLEEPABLE));
!(init_attr->flags &
RDMA_CREATE_AH_SLEEPABLE));
if (rc) {
ibdev_err(&rdev->ibdev, "Failed to allocate HW AH");
return rc;
@ -856,7 +855,7 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd,
if (ib_copy_from_udata(&ureq, udata, sizeof(ureq)))
return -EFAULT;
bytes = (qplib_qp->sq.max_wqe * BNXT_QPLIB_MAX_SQE_ENTRY_SIZE);
bytes = (qplib_qp->sq.max_wqe * qplib_qp->sq.wqe_size);
/* Consider mapping PSN search memory only for RC QPs. */
if (qplib_qp->type == CMDQ_CREATE_QP_TYPE_RC) {
psn_sz = bnxt_qplib_is_chip_gen_p5(rdev->chip_ctx) ?
@ -879,7 +878,7 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd,
qplib_qp->qp_handle = ureq.qp_handle;
if (!qp->qplib_qp.srq) {
bytes = (qplib_qp->rq.max_wqe * BNXT_QPLIB_MAX_RQE_ENTRY_SIZE);
bytes = (qplib_qp->rq.max_wqe * qplib_qp->rq.wqe_size);
bytes = PAGE_ALIGN(bytes);
umem = ib_umem_get(&rdev->ibdev, ureq.qprva, bytes,
IB_ACCESS_LOCAL_WRITE);
@ -976,6 +975,7 @@ static struct bnxt_re_qp *bnxt_re_create_shadow_qp
qp->qplib_qp.sig_type = true;
/* Shadow QP SQ depth should be same as QP1 RQ depth */
qp->qplib_qp.sq.wqe_size = bnxt_re_get_swqe_size();
qp->qplib_qp.sq.max_wqe = qp1_qp->rq.max_wqe;
qp->qplib_qp.sq.max_sge = 2;
/* Q full delta can be 1 since it is internal QP */
@ -986,6 +986,7 @@ static struct bnxt_re_qp *bnxt_re_create_shadow_qp
qp->qplib_qp.scq = qp1_qp->scq;
qp->qplib_qp.rcq = qp1_qp->rcq;
qp->qplib_qp.rq.wqe_size = bnxt_re_get_rwqe_size();
qp->qplib_qp.rq.max_wqe = qp1_qp->rq.max_wqe;
qp->qplib_qp.rq.max_sge = qp1_qp->rq.max_sge;
/* Q full delta can be 1 since it is internal QP */
@ -1021,10 +1022,12 @@ static int bnxt_re_init_rq_attr(struct bnxt_re_qp *qp,
struct bnxt_qplib_dev_attr *dev_attr;
struct bnxt_qplib_qp *qplqp;
struct bnxt_re_dev *rdev;
struct bnxt_qplib_q *rq;
int entries;
rdev = qp->rdev;
qplqp = &qp->qplib_qp;
rq = &qplqp->rq;
dev_attr = &rdev->dev_attr;
if (init_attr->srq) {
@ -1036,23 +1039,21 @@ static int bnxt_re_init_rq_attr(struct bnxt_re_qp *qp,
return -EINVAL;
}
qplqp->srq = &srq->qplib_srq;
qplqp->rq.max_wqe = 0;
rq->max_wqe = 0;
} else {
rq->wqe_size = bnxt_re_get_rwqe_size();
/* Allocate 1 more than what's provided so posting max doesn't
* mean empty.
*/
entries = roundup_pow_of_two(init_attr->cap.max_recv_wr + 1);
qplqp->rq.max_wqe = min_t(u32, entries,
dev_attr->max_qp_wqes + 1);
qplqp->rq.q_full_delta = qplqp->rq.max_wqe -
init_attr->cap.max_recv_wr;
qplqp->rq.max_sge = init_attr->cap.max_recv_sge;
if (qplqp->rq.max_sge > dev_attr->max_qp_sges)
qplqp->rq.max_sge = dev_attr->max_qp_sges;
rq->max_wqe = min_t(u32, entries, dev_attr->max_qp_wqes + 1);
rq->q_full_delta = rq->max_wqe - init_attr->cap.max_recv_wr;
rq->max_sge = init_attr->cap.max_recv_sge;
if (rq->max_sge > dev_attr->max_qp_sges)
rq->max_sge = dev_attr->max_qp_sges;
}
qplqp->rq.sg_info.pgsize = PAGE_SIZE;
qplqp->rq.sg_info.pgshft = PAGE_SHIFT;
rq->sg_info.pgsize = PAGE_SIZE;
rq->sg_info.pgshft = PAGE_SHIFT;
return 0;
}
@ -1080,15 +1081,18 @@ static void bnxt_re_init_sq_attr(struct bnxt_re_qp *qp,
struct bnxt_qplib_dev_attr *dev_attr;
struct bnxt_qplib_qp *qplqp;
struct bnxt_re_dev *rdev;
struct bnxt_qplib_q *sq;
int entries;
rdev = qp->rdev;
qplqp = &qp->qplib_qp;
sq = &qplqp->sq;
dev_attr = &rdev->dev_attr;
qplqp->sq.max_sge = init_attr->cap.max_send_sge;
if (qplqp->sq.max_sge > dev_attr->max_qp_sges)
qplqp->sq.max_sge = dev_attr->max_qp_sges;
sq->wqe_size = bnxt_re_get_swqe_size();
sq->max_sge = init_attr->cap.max_send_sge;
if (sq->max_sge > dev_attr->max_qp_sges)
sq->max_sge = dev_attr->max_qp_sges;
/*
* Change the SQ depth if user has requested minimum using
* configfs. Only supported for kernel consumers
@ -1096,9 +1100,9 @@ static void bnxt_re_init_sq_attr(struct bnxt_re_qp *qp,
entries = init_attr->cap.max_send_wr;
/* Allocate 128 + 1 more than what's provided */
entries = roundup_pow_of_two(entries + BNXT_QPLIB_RESERVED_QP_WRS + 1);
qplqp->sq.max_wqe = min_t(u32, entries, dev_attr->max_qp_wqes +
BNXT_QPLIB_RESERVED_QP_WRS + 1);
qplqp->sq.q_full_delta = BNXT_QPLIB_RESERVED_QP_WRS + 1;
sq->max_wqe = min_t(u32, entries, dev_attr->max_qp_wqes +
BNXT_QPLIB_RESERVED_QP_WRS + 1);
sq->q_full_delta = BNXT_QPLIB_RESERVED_QP_WRS + 1;
/*
* Reserving one slot for Phantom WQE. Application can
* post one extra entry in this case. But allowing this to avoid
@ -1511,7 +1515,7 @@ static int bnxt_re_init_user_srq(struct bnxt_re_dev *rdev,
if (ib_copy_from_udata(&ureq, udata, sizeof(ureq)))
return -EFAULT;
bytes = (qplib_srq->max_wqe * BNXT_QPLIB_MAX_RQE_ENTRY_SIZE);
bytes = (qplib_srq->max_wqe * qplib_srq->wqe_size);
bytes = PAGE_ALIGN(bytes);
umem = ib_umem_get(&rdev->ibdev, ureq.srqva, bytes,
IB_ACCESS_LOCAL_WRITE);
@ -1534,15 +1538,20 @@ int bnxt_re_create_srq(struct ib_srq *ib_srq,
struct ib_srq_init_attr *srq_init_attr,
struct ib_udata *udata)
{
struct ib_pd *ib_pd = ib_srq->pd;
struct bnxt_re_pd *pd = container_of(ib_pd, struct bnxt_re_pd, ib_pd);
struct bnxt_re_dev *rdev = pd->rdev;
struct bnxt_qplib_dev_attr *dev_attr = &rdev->dev_attr;
struct bnxt_re_srq *srq =
container_of(ib_srq, struct bnxt_re_srq, ib_srq);
struct bnxt_qplib_dev_attr *dev_attr;
struct bnxt_qplib_nq *nq = NULL;
struct bnxt_re_dev *rdev;
struct bnxt_re_srq *srq;
struct bnxt_re_pd *pd;
struct ib_pd *ib_pd;
int rc, entries;
ib_pd = ib_srq->pd;
pd = container_of(ib_pd, struct bnxt_re_pd, ib_pd);
rdev = pd->rdev;
dev_attr = &rdev->dev_attr;
srq = container_of(ib_srq, struct bnxt_re_srq, ib_srq);
if (srq_init_attr->attr.max_wr >= dev_attr->max_srq_wqes) {
ibdev_err(&rdev->ibdev, "Create CQ failed - max exceeded");
rc = -EINVAL;
@ -1563,8 +1572,9 @@ int bnxt_re_create_srq(struct ib_srq *ib_srq,
entries = roundup_pow_of_two(srq_init_attr->attr.max_wr + 1);
if (entries > dev_attr->max_srq_wqes + 1)
entries = dev_attr->max_srq_wqes + 1;
srq->qplib_srq.max_wqe = entries;
srq->qplib_srq.wqe_size = bnxt_re_get_rwqe_size();
srq->qplib_srq.max_sge = srq_init_attr->attr.max_sge;
srq->qplib_srq.threshold = srq_init_attr->attr.srq_limit;
srq->srq_limit = srq_init_attr->attr.srq_limit;

View File

@ -122,12 +122,6 @@ struct bnxt_re_frpl {
u64 *page_list;
};
struct bnxt_re_fmr {
struct bnxt_re_dev *rdev;
struct ib_fmr ib_fmr;
struct bnxt_qplib_mrw qplib_fmr;
};
struct bnxt_re_mw {
struct bnxt_re_dev *rdev;
struct ib_mw ib_mw;
@ -142,6 +136,16 @@ struct bnxt_re_ucontext {
spinlock_t sh_lock; /* protect shpg */
};
static inline u16 bnxt_re_get_swqe_size(void)
{
return sizeof(struct sq_send);
}
static inline u16 bnxt_re_get_rwqe_size(void)
{
return sizeof(struct rq_wqe);
}
int bnxt_re_query_device(struct ib_device *ibdev,
struct ib_device_attr *ib_attr,
struct ib_udata *udata);
@ -160,7 +164,7 @@ enum rdma_link_layer bnxt_re_get_link_layer(struct ib_device *ibdev,
u8 port_num);
int bnxt_re_alloc_pd(struct ib_pd *pd, struct ib_udata *udata);
void bnxt_re_dealloc_pd(struct ib_pd *pd, struct ib_udata *udata);
int bnxt_re_create_ah(struct ib_ah *ah, struct rdma_ah_attr *ah_attr, u32 flags,
int bnxt_re_create_ah(struct ib_ah *ah, struct rdma_ah_init_attr *init_attr,
struct ib_udata *udata);
int bnxt_re_modify_ah(struct ib_ah *ah, struct rdma_ah_attr *ah_attr);
int bnxt_re_query_ah(struct ib_ah *ah, struct rdma_ah_attr *ah_attr);

View File

@ -300,12 +300,12 @@ static void bnxt_qplib_service_nq(unsigned long data)
{
struct bnxt_qplib_nq *nq = (struct bnxt_qplib_nq *)data;
struct bnxt_qplib_hwq *hwq = &nq->hwq;
struct nq_base *nqe, **nq_ptr;
struct bnxt_qplib_cq *cq;
int num_cqne_processed = 0;
int num_srqne_processed = 0;
int num_cqne_processed = 0;
struct bnxt_qplib_cq *cq;
int budget = nq->budget;
u32 sw_cons, raw_cons;
struct nq_base *nqe;
uintptr_t q_handle;
u16 type;
@ -314,8 +314,7 @@ static void bnxt_qplib_service_nq(unsigned long data)
raw_cons = hwq->cons;
while (budget--) {
sw_cons = HWQ_CMP(raw_cons, hwq);
nq_ptr = (struct nq_base **)hwq->pbl_ptr;
nqe = &nq_ptr[NQE_PG(sw_cons)][NQE_IDX(sw_cons)];
nqe = bnxt_qplib_get_qe(hwq, sw_cons, NULL);
if (!NQE_CMP_VALID(nqe, raw_cons, hwq->max_elements))
break;
@ -392,13 +391,11 @@ static irqreturn_t bnxt_qplib_nq_irq(int irq, void *dev_instance)
{
struct bnxt_qplib_nq *nq = dev_instance;
struct bnxt_qplib_hwq *hwq = &nq->hwq;
struct nq_base **nq_ptr;
u32 sw_cons;
/* Prefetch the NQ element */
sw_cons = HWQ_CMP(hwq->cons, hwq);
nq_ptr = (struct nq_base **)nq->hwq.pbl_ptr;
prefetch(&nq_ptr[NQE_PG(sw_cons)][NQE_IDX(sw_cons)]);
prefetch(bnxt_qplib_get_qe(hwq, sw_cons, NULL));
/* Fan out to CPU affinitized kthreads? */
tasklet_schedule(&nq->nq_tasklet);
@ -612,12 +609,13 @@ int bnxt_qplib_create_srq(struct bnxt_qplib_res *res,
struct cmdq_create_srq req;
struct bnxt_qplib_pbl *pbl;
u16 cmd_flags = 0;
u16 pg_sz_lvl;
int rc, idx;
hwq_attr.res = res;
hwq_attr.sginfo = &srq->sg_info;
hwq_attr.depth = srq->max_wqe;
hwq_attr.stride = BNXT_QPLIB_MAX_RQE_ENTRY_SIZE;
hwq_attr.stride = srq->wqe_size;
hwq_attr.type = HWQ_TYPE_QUEUE;
rc = bnxt_qplib_alloc_init_hwq(&srq->hwq, &hwq_attr);
if (rc)
@ -638,22 +636,11 @@ int bnxt_qplib_create_srq(struct bnxt_qplib_res *res,
req.srq_size = cpu_to_le16((u16)srq->hwq.max_elements);
pbl = &srq->hwq.pbl[PBL_LVL_0];
req.pg_size_lvl = cpu_to_le16((((u16)srq->hwq.level &
CMDQ_CREATE_SRQ_LVL_MASK) <<
CMDQ_CREATE_SRQ_LVL_SFT) |
(pbl->pg_size == ROCE_PG_SIZE_4K ?
CMDQ_CREATE_SRQ_PG_SIZE_PG_4K :
pbl->pg_size == ROCE_PG_SIZE_8K ?
CMDQ_CREATE_SRQ_PG_SIZE_PG_8K :
pbl->pg_size == ROCE_PG_SIZE_64K ?
CMDQ_CREATE_SRQ_PG_SIZE_PG_64K :
pbl->pg_size == ROCE_PG_SIZE_2M ?
CMDQ_CREATE_SRQ_PG_SIZE_PG_2M :
pbl->pg_size == ROCE_PG_SIZE_8M ?
CMDQ_CREATE_SRQ_PG_SIZE_PG_8M :
pbl->pg_size == ROCE_PG_SIZE_1G ?
CMDQ_CREATE_SRQ_PG_SIZE_PG_1G :
CMDQ_CREATE_SRQ_PG_SIZE_PG_4K));
pg_sz_lvl = ((u16)bnxt_qplib_base_pg_size(&srq->hwq) <<
CMDQ_CREATE_SRQ_PG_SIZE_SFT);
pg_sz_lvl |= (srq->hwq.level & CMDQ_CREATE_SRQ_LVL_MASK) <<
CMDQ_CREATE_SRQ_LVL_SFT;
req.pg_size_lvl = cpu_to_le16(pg_sz_lvl);
req.pbl = cpu_to_le64(pbl->pg_map_arr[0]);
req.pd_id = cpu_to_le32(srq->pd->id);
req.eventq_id = cpu_to_le16(srq->eventq_hw_ring_id);
@ -740,7 +727,7 @@ int bnxt_qplib_post_srq_recv(struct bnxt_qplib_srq *srq,
struct bnxt_qplib_swqe *wqe)
{
struct bnxt_qplib_hwq *srq_hwq = &srq->hwq;
struct rq_wqe *srqe, **srqe_ptr;
struct rq_wqe *srqe;
struct sq_sge *hw_sge;
u32 sw_prod, sw_cons, count = 0;
int i, rc = 0, next;
@ -758,9 +745,8 @@ int bnxt_qplib_post_srq_recv(struct bnxt_qplib_srq *srq,
spin_unlock(&srq_hwq->lock);
sw_prod = HWQ_CMP(srq_hwq->prod, srq_hwq);
srqe_ptr = (struct rq_wqe **)srq_hwq->pbl_ptr;
srqe = &srqe_ptr[RQE_PG(sw_prod)][RQE_IDX(sw_prod)];
memset(srqe, 0, BNXT_QPLIB_MAX_RQE_ENTRY_SIZE);
srqe = bnxt_qplib_get_qe(srq_hwq, sw_prod, NULL);
memset(srqe, 0, srq->wqe_size);
/* Calculate wqe_size16 and data_len */
for (i = 0, hw_sge = (struct sq_sge *)srqe->data;
i < wqe->num_sge; i++, hw_sge++) {
@ -809,6 +795,7 @@ int bnxt_qplib_create_qp1(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
struct bnxt_qplib_pbl *pbl;
u16 cmd_flags = 0;
u32 qp_flags = 0;
u8 pg_sz_lvl;
int rc;
RCFW_CMD_PREP(req, CREATE_QP1, cmd_flags);
@ -822,7 +809,7 @@ int bnxt_qplib_create_qp1(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
hwq_attr.res = res;
hwq_attr.sginfo = &sq->sg_info;
hwq_attr.depth = sq->max_wqe;
hwq_attr.stride = BNXT_QPLIB_MAX_SQE_ENTRY_SIZE;
hwq_attr.stride = sq->wqe_size;
hwq_attr.type = HWQ_TYPE_QUEUE;
rc = bnxt_qplib_alloc_init_hwq(&sq->hwq, &hwq_attr);
if (rc)
@ -835,33 +822,18 @@ int bnxt_qplib_create_qp1(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
}
pbl = &sq->hwq.pbl[PBL_LVL_0];
req.sq_pbl = cpu_to_le64(pbl->pg_map_arr[0]);
req.sq_pg_size_sq_lvl =
((sq->hwq.level & CMDQ_CREATE_QP1_SQ_LVL_MASK)
<< CMDQ_CREATE_QP1_SQ_LVL_SFT) |
(pbl->pg_size == ROCE_PG_SIZE_4K ?
CMDQ_CREATE_QP1_SQ_PG_SIZE_PG_4K :
pbl->pg_size == ROCE_PG_SIZE_8K ?
CMDQ_CREATE_QP1_SQ_PG_SIZE_PG_8K :
pbl->pg_size == ROCE_PG_SIZE_64K ?
CMDQ_CREATE_QP1_SQ_PG_SIZE_PG_64K :
pbl->pg_size == ROCE_PG_SIZE_2M ?
CMDQ_CREATE_QP1_SQ_PG_SIZE_PG_2M :
pbl->pg_size == ROCE_PG_SIZE_8M ?
CMDQ_CREATE_QP1_SQ_PG_SIZE_PG_8M :
pbl->pg_size == ROCE_PG_SIZE_1G ?
CMDQ_CREATE_QP1_SQ_PG_SIZE_PG_1G :
CMDQ_CREATE_QP1_SQ_PG_SIZE_PG_4K);
pg_sz_lvl = (bnxt_qplib_base_pg_size(&sq->hwq) <<
CMDQ_CREATE_QP1_SQ_PG_SIZE_SFT);
pg_sz_lvl |= (sq->hwq.level & CMDQ_CREATE_QP1_SQ_LVL_MASK);
req.sq_pg_size_sq_lvl = pg_sz_lvl;
if (qp->scq)
req.scq_cid = cpu_to_le32(qp->scq->id);
qp_flags |= CMDQ_CREATE_QP1_QP_FLAGS_RESERVED_LKEY_ENABLE;
/* RQ */
if (rq->max_wqe) {
hwq_attr.res = res;
hwq_attr.sginfo = &rq->sg_info;
hwq_attr.stride = BNXT_QPLIB_MAX_RQE_ENTRY_SIZE;
hwq_attr.stride = rq->wqe_size;
hwq_attr.depth = qp->rq.max_wqe;
hwq_attr.type = HWQ_TYPE_QUEUE;
rc = bnxt_qplib_alloc_init_hwq(&rq->hwq, &hwq_attr);
@ -876,32 +848,20 @@ int bnxt_qplib_create_qp1(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
}
pbl = &rq->hwq.pbl[PBL_LVL_0];
req.rq_pbl = cpu_to_le64(pbl->pg_map_arr[0]);
req.rq_pg_size_rq_lvl =
((rq->hwq.level & CMDQ_CREATE_QP1_RQ_LVL_MASK) <<
CMDQ_CREATE_QP1_RQ_LVL_SFT) |
(pbl->pg_size == ROCE_PG_SIZE_4K ?
CMDQ_CREATE_QP1_RQ_PG_SIZE_PG_4K :
pbl->pg_size == ROCE_PG_SIZE_8K ?
CMDQ_CREATE_QP1_RQ_PG_SIZE_PG_8K :
pbl->pg_size == ROCE_PG_SIZE_64K ?
CMDQ_CREATE_QP1_RQ_PG_SIZE_PG_64K :
pbl->pg_size == ROCE_PG_SIZE_2M ?
CMDQ_CREATE_QP1_RQ_PG_SIZE_PG_2M :
pbl->pg_size == ROCE_PG_SIZE_8M ?
CMDQ_CREATE_QP1_RQ_PG_SIZE_PG_8M :
pbl->pg_size == ROCE_PG_SIZE_1G ?
CMDQ_CREATE_QP1_RQ_PG_SIZE_PG_1G :
CMDQ_CREATE_QP1_RQ_PG_SIZE_PG_4K);
pg_sz_lvl = (bnxt_qplib_base_pg_size(&rq->hwq) <<
CMDQ_CREATE_QP1_RQ_PG_SIZE_SFT);
pg_sz_lvl |= (rq->hwq.level & CMDQ_CREATE_QP1_RQ_LVL_MASK);
req.rq_pg_size_rq_lvl = pg_sz_lvl;
if (qp->rcq)
req.rcq_cid = cpu_to_le32(qp->rcq->id);
}
/* Header buffer - allow hdr_buf pass in */
rc = bnxt_qplib_alloc_qp_hdr_buf(res, qp);
if (rc) {
rc = -ENOMEM;
goto fail;
}
qp_flags |= CMDQ_CREATE_QP1_QP_FLAGS_RESERVED_LKEY_ENABLE;
req.qp_flags = cpu_to_le32(qp_flags);
req.sq_size = cpu_to_le32(sq->hwq.max_elements);
req.rq_size = cpu_to_le32(rq->hwq.max_elements);
@ -948,23 +908,47 @@ exit:
return rc;
}
static void bnxt_qplib_init_psn_ptr(struct bnxt_qplib_qp *qp, int size)
{
struct bnxt_qplib_hwq *hwq;
struct bnxt_qplib_q *sq;
u64 fpsne, psne, psn_pg;
u16 indx_pad = 0, indx;
u16 pg_num, pg_indx;
u64 *page;
sq = &qp->sq;
hwq = &sq->hwq;
fpsne = (u64)bnxt_qplib_get_qe(hwq, hwq->max_elements, &psn_pg);
if (!IS_ALIGNED(fpsne, PAGE_SIZE))
indx_pad = ALIGN(fpsne, PAGE_SIZE) / size;
page = (u64 *)psn_pg;
for (indx = 0; indx < hwq->max_elements; indx++) {
pg_num = (indx + indx_pad) / (PAGE_SIZE / size);
pg_indx = (indx + indx_pad) % (PAGE_SIZE / size);
psne = page[pg_num] + pg_indx * size;
sq->swq[indx].psn_ext = (struct sq_psn_search_ext *)psne;
sq->swq[indx].psn_search = (struct sq_psn_search *)psne;
}
}
int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
{
struct bnxt_qplib_rcfw *rcfw = res->rcfw;
struct bnxt_qplib_hwq_attr hwq_attr = {};
unsigned long int psn_search, poff = 0;
struct bnxt_qplib_sg_info sginfo = {};
struct sq_psn_search **psn_search_ptr;
struct bnxt_qplib_q *sq = &qp->sq;
struct bnxt_qplib_q *rq = &qp->rq;
int i, rc, req_size, psn_sz = 0;
struct sq_send **hw_sq_send_ptr;
struct creq_create_qp_resp resp;
int rc, req_size, psn_sz = 0;
struct bnxt_qplib_hwq *xrrq;
u16 cmd_flags = 0, max_ssge;
struct cmdq_create_qp req;
struct bnxt_qplib_pbl *pbl;
struct cmdq_create_qp req;
u32 qp_flags = 0;
u8 pg_sz_lvl;
u16 max_rsge;
RCFW_CMD_PREP(req, CREATE_QP, cmd_flags);
@ -983,7 +967,7 @@ int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
hwq_attr.res = res;
hwq_attr.sginfo = &sq->sg_info;
hwq_attr.stride = BNXT_QPLIB_MAX_SQE_ENTRY_SIZE;
hwq_attr.stride = sq->wqe_size;
hwq_attr.depth = sq->max_wqe;
hwq_attr.aux_stride = psn_sz;
hwq_attr.aux_depth = hwq_attr.depth;
@ -997,64 +981,25 @@ int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
rc = -ENOMEM;
goto fail_sq;
}
hw_sq_send_ptr = (struct sq_send **)sq->hwq.pbl_ptr;
if (psn_sz) {
psn_search_ptr = (struct sq_psn_search **)
&hw_sq_send_ptr[get_sqe_pg
(sq->hwq.max_elements)];
psn_search = (unsigned long int)
&hw_sq_send_ptr[get_sqe_pg(sq->hwq.max_elements)]
[get_sqe_idx(sq->hwq.max_elements)];
if (psn_search & ~PAGE_MASK) {
/* If the psn_search does not start on a page boundary,
* then calculate the offset
*/
poff = (psn_search & ~PAGE_MASK) /
BNXT_QPLIB_MAX_PSNE_ENTRY_SIZE;
}
for (i = 0; i < sq->hwq.max_elements; i++) {
sq->swq[i].psn_search =
&psn_search_ptr[get_psne_pg(i + poff)]
[get_psne_idx(i + poff)];
/*psns_ext will be used only for P5 chips. */
sq->swq[i].psn_ext =
(struct sq_psn_search_ext *)
&psn_search_ptr[get_psne_pg(i + poff)]
[get_psne_idx(i + poff)];
}
}
if (psn_sz)
bnxt_qplib_init_psn_ptr(qp, psn_sz);
pbl = &sq->hwq.pbl[PBL_LVL_0];
req.sq_pbl = cpu_to_le64(pbl->pg_map_arr[0]);
req.sq_pg_size_sq_lvl =
((sq->hwq.level & CMDQ_CREATE_QP_SQ_LVL_MASK)
<< CMDQ_CREATE_QP_SQ_LVL_SFT) |
(pbl->pg_size == ROCE_PG_SIZE_4K ?
CMDQ_CREATE_QP_SQ_PG_SIZE_PG_4K :
pbl->pg_size == ROCE_PG_SIZE_8K ?
CMDQ_CREATE_QP_SQ_PG_SIZE_PG_8K :
pbl->pg_size == ROCE_PG_SIZE_64K ?
CMDQ_CREATE_QP_SQ_PG_SIZE_PG_64K :
pbl->pg_size == ROCE_PG_SIZE_2M ?
CMDQ_CREATE_QP_SQ_PG_SIZE_PG_2M :
pbl->pg_size == ROCE_PG_SIZE_8M ?
CMDQ_CREATE_QP_SQ_PG_SIZE_PG_8M :
pbl->pg_size == ROCE_PG_SIZE_1G ?
CMDQ_CREATE_QP_SQ_PG_SIZE_PG_1G :
CMDQ_CREATE_QP_SQ_PG_SIZE_PG_4K);
pg_sz_lvl = (bnxt_qplib_base_pg_size(&sq->hwq) <<
CMDQ_CREATE_QP_SQ_PG_SIZE_SFT);
pg_sz_lvl |= (sq->hwq.level & CMDQ_CREATE_QP_SQ_LVL_MASK);
req.sq_pg_size_sq_lvl = pg_sz_lvl;
if (qp->scq)
req.scq_cid = cpu_to_le32(qp->scq->id);
qp_flags |= CMDQ_CREATE_QP_QP_FLAGS_RESERVED_LKEY_ENABLE;
qp_flags |= CMDQ_CREATE_QP_QP_FLAGS_FR_PMR_ENABLED;
if (qp->sig_type)
qp_flags |= CMDQ_CREATE_QP_QP_FLAGS_FORCE_COMPLETION;
/* RQ */
if (rq->max_wqe) {
hwq_attr.res = res;
hwq_attr.sginfo = &rq->sg_info;
hwq_attr.stride = BNXT_QPLIB_MAX_RQE_ENTRY_SIZE;
hwq_attr.stride = rq->wqe_size;
hwq_attr.depth = rq->max_wqe;
hwq_attr.aux_stride = 0;
hwq_attr.aux_depth = 0;
@ -1071,22 +1016,10 @@ int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
}
pbl = &rq->hwq.pbl[PBL_LVL_0];
req.rq_pbl = cpu_to_le64(pbl->pg_map_arr[0]);
req.rq_pg_size_rq_lvl =
((rq->hwq.level & CMDQ_CREATE_QP_RQ_LVL_MASK) <<
CMDQ_CREATE_QP_RQ_LVL_SFT) |
(pbl->pg_size == ROCE_PG_SIZE_4K ?
CMDQ_CREATE_QP_RQ_PG_SIZE_PG_4K :
pbl->pg_size == ROCE_PG_SIZE_8K ?
CMDQ_CREATE_QP_RQ_PG_SIZE_PG_8K :
pbl->pg_size == ROCE_PG_SIZE_64K ?
CMDQ_CREATE_QP_RQ_PG_SIZE_PG_64K :
pbl->pg_size == ROCE_PG_SIZE_2M ?
CMDQ_CREATE_QP_RQ_PG_SIZE_PG_2M :
pbl->pg_size == ROCE_PG_SIZE_8M ?
CMDQ_CREATE_QP_RQ_PG_SIZE_PG_8M :
pbl->pg_size == ROCE_PG_SIZE_1G ?
CMDQ_CREATE_QP_RQ_PG_SIZE_PG_1G :
CMDQ_CREATE_QP_RQ_PG_SIZE_PG_4K);
pg_sz_lvl = (bnxt_qplib_base_pg_size(&rq->hwq) <<
CMDQ_CREATE_QP_RQ_PG_SIZE_SFT);
pg_sz_lvl |= (rq->hwq.level & CMDQ_CREATE_QP_RQ_LVL_MASK);
req.rq_pg_size_rq_lvl = pg_sz_lvl;
} else {
/* SRQ */
if (qp->srq) {
@ -1097,7 +1030,13 @@ int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp)
if (qp->rcq)
req.rcq_cid = cpu_to_le32(qp->rcq->id);
qp_flags |= CMDQ_CREATE_QP_QP_FLAGS_RESERVED_LKEY_ENABLE;
qp_flags |= CMDQ_CREATE_QP_QP_FLAGS_FR_PMR_ENABLED;
if (qp->sig_type)
qp_flags |= CMDQ_CREATE_QP_QP_FLAGS_FORCE_COMPLETION;
req.qp_flags = cpu_to_le32(qp_flags);
req.sq_size = cpu_to_le32(sq->hwq.max_elements);
req.rq_size = cpu_to_le32(rq->hwq.max_elements);
qp->sq_hdr_buf = NULL;
@ -1483,12 +1422,11 @@ bail:
static void __clean_cq(struct bnxt_qplib_cq *cq, u64 qp)
{
struct bnxt_qplib_hwq *cq_hwq = &cq->hwq;
struct cq_base *hw_cqe, **hw_cqe_ptr;
struct cq_base *hw_cqe;
int i;
for (i = 0; i < cq_hwq->max_elements; i++) {
hw_cqe_ptr = (struct cq_base **)cq_hwq->pbl_ptr;
hw_cqe = &hw_cqe_ptr[CQE_PG(i)][CQE_IDX(i)];
hw_cqe = bnxt_qplib_get_qe(cq_hwq, i, NULL);
if (!CQE_CMP_VALID(hw_cqe, i, cq_hwq->max_elements))
continue;
/*
@ -1615,6 +1553,34 @@ void *bnxt_qplib_get_qp1_rq_buf(struct bnxt_qplib_qp *qp,
return NULL;
}
static void bnxt_qplib_fill_psn_search(struct bnxt_qplib_qp *qp,
struct bnxt_qplib_swqe *wqe,
struct bnxt_qplib_swq *swq)
{
struct sq_psn_search_ext *psns_ext;
struct sq_psn_search *psns;
u32 flg_npsn;
u32 op_spsn;
psns = swq->psn_search;
psns_ext = swq->psn_ext;
op_spsn = ((swq->start_psn << SQ_PSN_SEARCH_START_PSN_SFT) &
SQ_PSN_SEARCH_START_PSN_MASK);
op_spsn |= ((wqe->type << SQ_PSN_SEARCH_OPCODE_SFT) &
SQ_PSN_SEARCH_OPCODE_MASK);
flg_npsn = ((swq->next_psn << SQ_PSN_SEARCH_NEXT_PSN_SFT) &
SQ_PSN_SEARCH_NEXT_PSN_MASK);
if (bnxt_qplib_is_chip_gen_p5(qp->cctx)) {
psns_ext->opcode_start_psn = cpu_to_le32(op_spsn);
psns_ext->flags_next_psn = cpu_to_le32(flg_npsn);
} else {
psns->opcode_start_psn = cpu_to_le32(op_spsn);
psns->flags_next_psn = cpu_to_le32(flg_npsn);
}
}
void bnxt_qplib_post_send_db(struct bnxt_qplib_qp *qp)
{
struct bnxt_qplib_q *sq = &qp->sq;
@ -1625,16 +1591,16 @@ void bnxt_qplib_post_send_db(struct bnxt_qplib_qp *qp)
int bnxt_qplib_post_send(struct bnxt_qplib_qp *qp,
struct bnxt_qplib_swqe *wqe)
{
struct bnxt_qplib_q *sq = &qp->sq;
struct bnxt_qplib_swq *swq;
struct sq_send *hw_sq_send_hdr, **hw_sq_send_ptr;
struct sq_sge *hw_sge;
struct bnxt_qplib_nq_work *nq_work = NULL;
bool sch_handler = false;
u32 sw_prod;
u8 wqe_size16;
int i, rc = 0, data_len = 0, pkt_num = 0;
struct bnxt_qplib_q *sq = &qp->sq;
struct sq_send *hw_sq_send_hdr;
struct bnxt_qplib_swq *swq;
bool sch_handler = false;
struct sq_sge *hw_sge;
u8 wqe_size16;
__le32 temp32;
u32 sw_prod;
if (qp->state != CMDQ_MODIFY_QP_NEW_STATE_RTS) {
if (qp->state == CMDQ_MODIFY_QP_NEW_STATE_ERR) {
@ -1663,11 +1629,8 @@ int bnxt_qplib_post_send(struct bnxt_qplib_qp *qp,
swq->flags |= SQ_SEND_FLAGS_SIGNAL_COMP;
swq->start_psn = sq->psn & BTH_PSN_MASK;
hw_sq_send_ptr = (struct sq_send **)sq->hwq.pbl_ptr;
hw_sq_send_hdr = &hw_sq_send_ptr[get_sqe_pg(sw_prod)]
[get_sqe_idx(sw_prod)];
memset(hw_sq_send_hdr, 0, BNXT_QPLIB_MAX_SQE_ENTRY_SIZE);
hw_sq_send_hdr = bnxt_qplib_get_qe(&sq->hwq, sw_prod, NULL);
memset(hw_sq_send_hdr, 0, sq->wqe_size);
if (wqe->flags & BNXT_QPLIB_SWQE_FLAGS_INLINE) {
/* Copy the inline data */
@ -1854,28 +1817,8 @@ int bnxt_qplib_post_send(struct bnxt_qplib_qp *qp,
goto done;
}
swq->next_psn = sq->psn & BTH_PSN_MASK;
if (swq->psn_search) {
u32 opcd_spsn;
u32 flg_npsn;
opcd_spsn = ((swq->start_psn << SQ_PSN_SEARCH_START_PSN_SFT) &
SQ_PSN_SEARCH_START_PSN_MASK);
opcd_spsn |= ((wqe->type << SQ_PSN_SEARCH_OPCODE_SFT) &
SQ_PSN_SEARCH_OPCODE_MASK);
flg_npsn = ((swq->next_psn << SQ_PSN_SEARCH_NEXT_PSN_SFT) &
SQ_PSN_SEARCH_NEXT_PSN_MASK);
if (bnxt_qplib_is_chip_gen_p5(qp->cctx)) {
swq->psn_ext->opcode_start_psn =
cpu_to_le32(opcd_spsn);
swq->psn_ext->flags_next_psn =
cpu_to_le32(flg_npsn);
} else {
swq->psn_search->opcode_start_psn =
cpu_to_le32(opcd_spsn);
swq->psn_search->flags_next_psn =
cpu_to_le32(flg_npsn);
}
}
if (qp->type == CMDQ_CREATE_QP_TYPE_RC)
bnxt_qplib_fill_psn_search(qp, wqe, swq);
queue_err:
if (sch_handler) {
/* Store the ULP info in the software structures */
@ -1918,13 +1861,13 @@ void bnxt_qplib_post_recv_db(struct bnxt_qplib_qp *qp)
int bnxt_qplib_post_recv(struct bnxt_qplib_qp *qp,
struct bnxt_qplib_swqe *wqe)
{
struct bnxt_qplib_q *rq = &qp->rq;
struct rq_wqe *rqe, **rqe_ptr;
struct sq_sge *hw_sge;
struct bnxt_qplib_nq_work *nq_work = NULL;
struct bnxt_qplib_q *rq = &qp->rq;
bool sch_handler = false;
u32 sw_prod;
struct sq_sge *hw_sge;
struct rq_wqe *rqe;
int i, rc = 0;
u32 sw_prod;
if (qp->state == CMDQ_MODIFY_QP_NEW_STATE_ERR) {
sch_handler = true;
@ -1941,10 +1884,8 @@ int bnxt_qplib_post_recv(struct bnxt_qplib_qp *qp,
sw_prod = HWQ_CMP(rq->hwq.prod, &rq->hwq);
rq->swq[sw_prod].wr_id = wqe->wr_id;
rqe_ptr = (struct rq_wqe **)rq->hwq.pbl_ptr;
rqe = &rqe_ptr[RQE_PG(sw_prod)][RQE_IDX(sw_prod)];
memset(rqe, 0, BNXT_QPLIB_MAX_RQE_ENTRY_SIZE);
rqe = bnxt_qplib_get_qe(&rq->hwq, sw_prod, NULL);
memset(rqe, 0, rq->wqe_size);
/* Calculate wqe_size16 and data_len */
for (i = 0, hw_sge = (struct sq_sge *)rqe->data;
@ -1997,9 +1938,10 @@ int bnxt_qplib_create_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq)
struct bnxt_qplib_rcfw *rcfw = res->rcfw;
struct bnxt_qplib_hwq_attr hwq_attr = {};
struct creq_create_cq_resp resp;
struct cmdq_create_cq req;
struct bnxt_qplib_pbl *pbl;
struct cmdq_create_cq req;
u16 cmd_flags = 0;
u32 pg_sz_lvl;
int rc;
hwq_attr.res = res;
@ -2020,22 +1962,13 @@ int bnxt_qplib_create_cq(struct bnxt_qplib_res *res, struct bnxt_qplib_cq *cq)
}
req.dpi = cpu_to_le32(cq->dpi->dpi);
req.cq_handle = cpu_to_le64(cq->cq_handle);
req.cq_size = cpu_to_le32(cq->hwq.max_elements);
pbl = &cq->hwq.pbl[PBL_LVL_0];
req.pg_size_lvl = cpu_to_le32(
((cq->hwq.level & CMDQ_CREATE_CQ_LVL_MASK) <<
CMDQ_CREATE_CQ_LVL_SFT) |
(pbl->pg_size == ROCE_PG_SIZE_4K ? CMDQ_CREATE_CQ_PG_SIZE_PG_4K :
pbl->pg_size == ROCE_PG_SIZE_8K ? CMDQ_CREATE_CQ_PG_SIZE_PG_8K :
pbl->pg_size == ROCE_PG_SIZE_64K ? CMDQ_CREATE_CQ_PG_SIZE_PG_64K :
pbl->pg_size == ROCE_PG_SIZE_2M ? CMDQ_CREATE_CQ_PG_SIZE_PG_2M :
pbl->pg_size == ROCE_PG_SIZE_8M ? CMDQ_CREATE_CQ_PG_SIZE_PG_8M :
pbl->pg_size == ROCE_PG_SIZE_1G ? CMDQ_CREATE_CQ_PG_SIZE_PG_1G :
CMDQ_CREATE_CQ_PG_SIZE_PG_4K));
pg_sz_lvl = (bnxt_qplib_base_pg_size(&cq->hwq) <<
CMDQ_CREATE_CQ_PG_SIZE_SFT);
pg_sz_lvl |= (cq->hwq.level & CMDQ_CREATE_CQ_LVL_MASK);
req.pg_size_lvl = cpu_to_le32(pg_sz_lvl);
req.pbl = cpu_to_le64(pbl->pg_map_arr[0]);
req.cq_fco_cnq_id = cpu_to_le32(
(cq->cnq_hw_ring_id & CMDQ_CREATE_CQ_CNQ_ID_MASK) <<
CMDQ_CREATE_CQ_CNQ_ID_SFT);
@ -2194,13 +2127,13 @@ void bnxt_qplib_mark_qp_error(void *qp_handle)
static int do_wa9060(struct bnxt_qplib_qp *qp, struct bnxt_qplib_cq *cq,
u32 cq_cons, u32 sw_sq_cons, u32 cqe_sq_cons)
{
struct bnxt_qplib_q *sq = &qp->sq;
struct bnxt_qplib_swq *swq;
u32 peek_sw_cq_cons, peek_raw_cq_cons, peek_sq_cons_idx;
struct cq_base *peek_hwcqe, **peek_hw_cqe_ptr;
struct bnxt_qplib_q *sq = &qp->sq;
struct cq_req *peek_req_hwcqe;
struct bnxt_qplib_qp *peek_qp;
struct bnxt_qplib_q *peek_sq;
struct bnxt_qplib_swq *swq;
struct cq_base *peek_hwcqe;
int i, rc = 0;
/* Normal mode */
@ -2230,9 +2163,8 @@ static int do_wa9060(struct bnxt_qplib_qp *qp, struct bnxt_qplib_cq *cq,
i = cq->hwq.max_elements;
while (i--) {
peek_sw_cq_cons = HWQ_CMP((peek_sw_cq_cons), &cq->hwq);
peek_hw_cqe_ptr = (struct cq_base **)cq->hwq.pbl_ptr;
peek_hwcqe = &peek_hw_cqe_ptr[CQE_PG(peek_sw_cq_cons)]
[CQE_IDX(peek_sw_cq_cons)];
peek_hwcqe = bnxt_qplib_get_qe(&cq->hwq,
peek_sw_cq_cons, NULL);
/* If the next hwcqe is VALID */
if (CQE_CMP_VALID(peek_hwcqe, peek_raw_cq_cons,
cq->hwq.max_elements)) {
@ -2294,11 +2226,11 @@ static int bnxt_qplib_cq_process_req(struct bnxt_qplib_cq *cq,
struct bnxt_qplib_cqe **pcqe, int *budget,
u32 cq_cons, struct bnxt_qplib_qp **lib_qp)
{
struct bnxt_qplib_qp *qp;
struct bnxt_qplib_q *sq;
struct bnxt_qplib_cqe *cqe;
u32 sw_sq_cons, cqe_sq_cons;
struct bnxt_qplib_swq *swq;
struct bnxt_qplib_cqe *cqe;
struct bnxt_qplib_qp *qp;
struct bnxt_qplib_q *sq;
int rc = 0;
qp = (struct bnxt_qplib_qp *)((unsigned long)
@ -2408,10 +2340,10 @@ static int bnxt_qplib_cq_process_res_rc(struct bnxt_qplib_cq *cq,
struct bnxt_qplib_cqe **pcqe,
int *budget)
{
struct bnxt_qplib_qp *qp;
struct bnxt_qplib_q *rq;
struct bnxt_qplib_srq *srq;
struct bnxt_qplib_cqe *cqe;
struct bnxt_qplib_qp *qp;
struct bnxt_qplib_q *rq;
u32 wr_id_idx;
int rc = 0;
@ -2483,10 +2415,10 @@ static int bnxt_qplib_cq_process_res_ud(struct bnxt_qplib_cq *cq,
struct bnxt_qplib_cqe **pcqe,
int *budget)
{
struct bnxt_qplib_qp *qp;
struct bnxt_qplib_q *rq;
struct bnxt_qplib_srq *srq;
struct bnxt_qplib_cqe *cqe;
struct bnxt_qplib_qp *qp;
struct bnxt_qplib_q *rq;
u32 wr_id_idx;
int rc = 0;
@ -2561,15 +2493,13 @@ done:
bool bnxt_qplib_is_cq_empty(struct bnxt_qplib_cq *cq)
{
struct cq_base *hw_cqe, **hw_cqe_ptr;
struct cq_base *hw_cqe;
u32 sw_cons, raw_cons;
bool rc = true;
raw_cons = cq->hwq.cons;
sw_cons = HWQ_CMP(raw_cons, &cq->hwq);
hw_cqe_ptr = (struct cq_base **)cq->hwq.pbl_ptr;
hw_cqe = &hw_cqe_ptr[CQE_PG(sw_cons)][CQE_IDX(sw_cons)];
hw_cqe = bnxt_qplib_get_qe(&cq->hwq, sw_cons, NULL);
/* Check for Valid bit. If the CQE is valid, return false */
rc = !CQE_CMP_VALID(hw_cqe, raw_cons, cq->hwq.max_elements);
return rc;
@ -2813,7 +2743,7 @@ int bnxt_qplib_process_flush_list(struct bnxt_qplib_cq *cq,
int bnxt_qplib_poll_cq(struct bnxt_qplib_cq *cq, struct bnxt_qplib_cqe *cqe,
int num_cqes, struct bnxt_qplib_qp **lib_qp)
{
struct cq_base *hw_cqe, **hw_cqe_ptr;
struct cq_base *hw_cqe;
u32 sw_cons, raw_cons;
int budget, rc = 0;
@ -2822,8 +2752,7 @@ int bnxt_qplib_poll_cq(struct bnxt_qplib_cq *cq, struct bnxt_qplib_cqe *cqe,
while (budget) {
sw_cons = HWQ_CMP(raw_cons, &cq->hwq);
hw_cqe_ptr = (struct cq_base **)cq->hwq.pbl_ptr;
hw_cqe = &hw_cqe_ptr[CQE_PG(sw_cons)][CQE_IDX(sw_cons)];
hw_cqe = bnxt_qplib_get_qe(&cq->hwq, sw_cons, NULL);
/* Check for Valid bit */
if (!CQE_CMP_VALID(hw_cqe, raw_cons, cq->hwq.max_elements))

View File

@ -45,6 +45,7 @@ struct bnxt_qplib_srq {
struct bnxt_qplib_db_info dbinfo;
u64 srq_handle;
u32 id;
u16 wqe_size;
u32 max_wqe;
u32 max_sge;
u32 threshold;
@ -65,38 +66,7 @@ struct bnxt_qplib_sge {
u32 size;
};
#define BNXT_QPLIB_MAX_SQE_ENTRY_SIZE sizeof(struct sq_send)
#define SQE_CNT_PER_PG (PAGE_SIZE / BNXT_QPLIB_MAX_SQE_ENTRY_SIZE)
#define SQE_MAX_IDX_PER_PG (SQE_CNT_PER_PG - 1)
static inline u32 get_sqe_pg(u32 val)
{
return ((val & ~SQE_MAX_IDX_PER_PG) / SQE_CNT_PER_PG);
}
static inline u32 get_sqe_idx(u32 val)
{
return (val & SQE_MAX_IDX_PER_PG);
}
#define BNXT_QPLIB_MAX_PSNE_ENTRY_SIZE sizeof(struct sq_psn_search)
#define PSNE_CNT_PER_PG (PAGE_SIZE / BNXT_QPLIB_MAX_PSNE_ENTRY_SIZE)
#define PSNE_MAX_IDX_PER_PG (PSNE_CNT_PER_PG - 1)
static inline u32 get_psne_pg(u32 val)
{
return ((val & ~PSNE_MAX_IDX_PER_PG) / PSNE_CNT_PER_PG);
}
static inline u32 get_psne_idx(u32 val)
{
return (val & PSNE_MAX_IDX_PER_PG);
}
#define BNXT_QPLIB_QP_MAX_SGL 6
struct bnxt_qplib_swq {
u64 wr_id;
int next_idx;
@ -226,19 +196,13 @@ struct bnxt_qplib_swqe {
};
};
#define BNXT_QPLIB_MAX_RQE_ENTRY_SIZE sizeof(struct rq_wqe)
#define RQE_CNT_PER_PG (PAGE_SIZE / BNXT_QPLIB_MAX_RQE_ENTRY_SIZE)
#define RQE_MAX_IDX_PER_PG (RQE_CNT_PER_PG - 1)
#define RQE_PG(x) (((x) & ~RQE_MAX_IDX_PER_PG) / RQE_CNT_PER_PG)
#define RQE_IDX(x) ((x) & RQE_MAX_IDX_PER_PG)
struct bnxt_qplib_q {
struct bnxt_qplib_hwq hwq;
struct bnxt_qplib_swq *swq;
struct bnxt_qplib_db_info dbinfo;
struct bnxt_qplib_sg_info sg_info;
u32 max_wqe;
u16 wqe_size;
u16 q_full_delta;
u16 max_sge;
u32 psn;
@ -256,7 +220,7 @@ struct bnxt_qplib_qp {
struct bnxt_qplib_dpi *dpi;
struct bnxt_qplib_chip_ctx *cctx;
u64 qp_handle;
#define BNXT_QPLIB_QP_ID_INVALID 0xFFFFFFFF
#define BNXT_QPLIB_QP_ID_INVALID 0xFFFFFFFF
u32 id;
u8 type;
u8 sig_type;

View File

@ -89,10 +89,9 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
struct creq_base *resp, void *sb, u8 is_block)
{
struct bnxt_qplib_cmdq_ctx *cmdq = &rcfw->cmdq;
struct bnxt_qplib_cmdqe *cmdqe, **hwq_ptr;
struct bnxt_qplib_hwq *hwq = &cmdq->hwq;
struct bnxt_qplib_crsqe *crsqe;
u32 cmdq_depth = rcfw->cmdq_depth;
struct bnxt_qplib_cmdqe *cmdqe;
u32 sw_prod, cmdq_prod;
struct pci_dev *pdev;
unsigned long flags;
@ -163,13 +162,11 @@ static int __send_message(struct bnxt_qplib_rcfw *rcfw, struct cmdq_base *req,
BNXT_QPLIB_CMDQE_UNITS;
}
hwq_ptr = (struct bnxt_qplib_cmdqe **)hwq->pbl_ptr;
preq = (u8 *)req;
do {
/* Locate the next cmdq slot */
sw_prod = HWQ_CMP(hwq->prod, hwq);
cmdqe = &hwq_ptr[get_cmdq_pg(sw_prod, cmdq_depth)]
[get_cmdq_idx(sw_prod, cmdq_depth)];
cmdqe = bnxt_qplib_get_qe(hwq, sw_prod, NULL);
if (!cmdqe) {
dev_err(&pdev->dev,
"RCFW request failed with no cmdqe!\n");
@ -378,7 +375,7 @@ static void bnxt_qplib_service_creq(unsigned long data)
struct bnxt_qplib_creq_ctx *creq = &rcfw->creq;
u32 type, budget = CREQ_ENTRY_POLL_BUDGET;
struct bnxt_qplib_hwq *hwq = &creq->hwq;
struct creq_base *creqe, **hwq_ptr;
struct creq_base *creqe;
u32 sw_cons, raw_cons;
unsigned long flags;
@ -387,8 +384,7 @@ static void bnxt_qplib_service_creq(unsigned long data)
raw_cons = hwq->cons;
while (budget > 0) {
sw_cons = HWQ_CMP(raw_cons, hwq);
hwq_ptr = (struct creq_base **)hwq->pbl_ptr;
creqe = &hwq_ptr[get_creq_pg(sw_cons)][get_creq_idx(sw_cons)];
creqe = bnxt_qplib_get_qe(hwq, sw_cons, NULL);
if (!CREQ_CMP_VALID(creqe, raw_cons, hwq->max_elements))
break;
/* The valid test of the entry must be done first before
@ -434,7 +430,6 @@ static irqreturn_t bnxt_qplib_creq_irq(int irq, void *dev_instance)
{
struct bnxt_qplib_rcfw *rcfw = dev_instance;
struct bnxt_qplib_creq_ctx *creq;
struct creq_base **creq_ptr;
struct bnxt_qplib_hwq *hwq;
u32 sw_cons;
@ -442,8 +437,7 @@ static irqreturn_t bnxt_qplib_creq_irq(int irq, void *dev_instance)
hwq = &creq->hwq;
/* Prefetch the CREQ element */
sw_cons = HWQ_CMP(hwq->cons, hwq);
creq_ptr = (struct creq_base **)creq->hwq.pbl_ptr;
prefetch(&creq_ptr[get_creq_pg(sw_cons)][get_creq_idx(sw_cons)]);
prefetch(bnxt_qplib_get_qe(hwq, sw_cons, NULL));
tasklet_schedule(&creq->creq_tasklet);
@ -468,29 +462,13 @@ int bnxt_qplib_deinit_rcfw(struct bnxt_qplib_rcfw *rcfw)
return 0;
}
static int __get_pbl_pg_idx(struct bnxt_qplib_pbl *pbl)
{
return (pbl->pg_size == ROCE_PG_SIZE_4K ?
CMDQ_INITIALIZE_FW_QPC_PG_SIZE_PG_4K :
pbl->pg_size == ROCE_PG_SIZE_8K ?
CMDQ_INITIALIZE_FW_QPC_PG_SIZE_PG_8K :
pbl->pg_size == ROCE_PG_SIZE_64K ?
CMDQ_INITIALIZE_FW_QPC_PG_SIZE_PG_64K :
pbl->pg_size == ROCE_PG_SIZE_2M ?
CMDQ_INITIALIZE_FW_QPC_PG_SIZE_PG_2M :
pbl->pg_size == ROCE_PG_SIZE_8M ?
CMDQ_INITIALIZE_FW_QPC_PG_SIZE_PG_8M :
pbl->pg_size == ROCE_PG_SIZE_1G ?
CMDQ_INITIALIZE_FW_QPC_PG_SIZE_PG_1G :
CMDQ_INITIALIZE_FW_QPC_PG_SIZE_PG_4K);
}
int bnxt_qplib_init_rcfw(struct bnxt_qplib_rcfw *rcfw,
struct bnxt_qplib_ctx *ctx, int is_virtfn)
{
struct cmdq_initialize_fw req;
struct creq_initialize_fw_resp resp;
u16 cmd_flags = 0, level;
struct cmdq_initialize_fw req;
u16 cmd_flags = 0;
u8 pgsz, lvl;
int rc;
RCFW_CMD_PREP(req, INITIALIZE_FW, cmd_flags);
@ -511,32 +489,30 @@ int bnxt_qplib_init_rcfw(struct bnxt_qplib_rcfw *rcfw,
if (bnxt_qplib_is_chip_gen_p5(rcfw->res->cctx))
goto config_vf_res;
level = ctx->qpc_tbl.level;
req.qpc_pg_size_qpc_lvl = (level << CMDQ_INITIALIZE_FW_QPC_LVL_SFT) |
__get_pbl_pg_idx(&ctx->qpc_tbl.pbl[level]);
level = ctx->mrw_tbl.level;
req.mrw_pg_size_mrw_lvl = (level << CMDQ_INITIALIZE_FW_MRW_LVL_SFT) |
__get_pbl_pg_idx(&ctx->mrw_tbl.pbl[level]);
level = ctx->srqc_tbl.level;
req.srq_pg_size_srq_lvl = (level << CMDQ_INITIALIZE_FW_SRQ_LVL_SFT) |
__get_pbl_pg_idx(&ctx->srqc_tbl.pbl[level]);
level = ctx->cq_tbl.level;
req.cq_pg_size_cq_lvl = (level << CMDQ_INITIALIZE_FW_CQ_LVL_SFT) |
__get_pbl_pg_idx(&ctx->cq_tbl.pbl[level]);
level = ctx->srqc_tbl.level;
req.srq_pg_size_srq_lvl = (level << CMDQ_INITIALIZE_FW_SRQ_LVL_SFT) |
__get_pbl_pg_idx(&ctx->srqc_tbl.pbl[level]);
level = ctx->cq_tbl.level;
req.cq_pg_size_cq_lvl = (level << CMDQ_INITIALIZE_FW_CQ_LVL_SFT) |
__get_pbl_pg_idx(&ctx->cq_tbl.pbl[level]);
level = ctx->tim_tbl.level;
req.tim_pg_size_tim_lvl = (level << CMDQ_INITIALIZE_FW_TIM_LVL_SFT) |
__get_pbl_pg_idx(&ctx->tim_tbl.pbl[level]);
level = ctx->tqm_ctx.pde.level;
req.tqm_pg_size_tqm_lvl =
(level << CMDQ_INITIALIZE_FW_TQM_LVL_SFT) |
__get_pbl_pg_idx(&ctx->tqm_ctx.pde.pbl[level]);
lvl = ctx->qpc_tbl.level;
pgsz = bnxt_qplib_base_pg_size(&ctx->qpc_tbl);
req.qpc_pg_size_qpc_lvl = (pgsz << CMDQ_INITIALIZE_FW_QPC_PG_SIZE_SFT) |
lvl;
lvl = ctx->mrw_tbl.level;
pgsz = bnxt_qplib_base_pg_size(&ctx->mrw_tbl);
req.mrw_pg_size_mrw_lvl = (pgsz << CMDQ_INITIALIZE_FW_QPC_PG_SIZE_SFT) |
lvl;
lvl = ctx->srqc_tbl.level;
pgsz = bnxt_qplib_base_pg_size(&ctx->srqc_tbl);
req.srq_pg_size_srq_lvl = (pgsz << CMDQ_INITIALIZE_FW_QPC_PG_SIZE_SFT) |
lvl;
lvl = ctx->cq_tbl.level;
pgsz = bnxt_qplib_base_pg_size(&ctx->cq_tbl);
req.cq_pg_size_cq_lvl = (pgsz << CMDQ_INITIALIZE_FW_QPC_PG_SIZE_SFT) |
lvl;
lvl = ctx->tim_tbl.level;
pgsz = bnxt_qplib_base_pg_size(&ctx->tim_tbl);
req.tim_pg_size_tim_lvl = (pgsz << CMDQ_INITIALIZE_FW_QPC_PG_SIZE_SFT) |
lvl;
lvl = ctx->tqm_ctx.pde.level;
pgsz = bnxt_qplib_base_pg_size(&ctx->tqm_ctx.pde);
req.tqm_pg_size_tqm_lvl = (pgsz << CMDQ_INITIALIZE_FW_QPC_PG_SIZE_SFT) |
lvl;
req.qpc_page_dir =
cpu_to_le64(ctx->qpc_tbl.pbl[PBL_LVL_0].pg_map_arr[0]);
req.mrw_page_dir =

View File

@ -87,12 +87,6 @@ static inline u32 bnxt_qplib_cmdqe_page_size(u32 depth)
return (bnxt_qplib_cmdqe_npages(depth) * PAGE_SIZE);
}
static inline u32 bnxt_qplib_cmdqe_cnt_per_pg(u32 depth)
{
return (bnxt_qplib_cmdqe_page_size(depth) /
BNXT_QPLIB_CMDQE_UNITS);
}
/* Set the cmd_size to a factor of CMDQE unit */
static inline void bnxt_qplib_set_cmd_slots(struct cmdq_base *req)
{
@ -100,30 +94,12 @@ static inline void bnxt_qplib_set_cmd_slots(struct cmdq_base *req)
BNXT_QPLIB_CMDQE_UNITS;
}
#define MAX_CMDQ_IDX(depth) ((depth) - 1)
static inline u32 bnxt_qplib_max_cmdq_idx_per_pg(u32 depth)
{
return (bnxt_qplib_cmdqe_cnt_per_pg(depth) - 1);
}
#define RCFW_MAX_COOKIE_VALUE 0x7FFF
#define RCFW_CMD_IS_BLOCKING 0x8000
#define RCFW_BLOCKED_CMD_WAIT_COUNT 0x4E20
#define HWRM_VERSION_RCFW_CMDQ_DEPTH_CHECK 0x1000900020011ULL
static inline u32 get_cmdq_pg(u32 val, u32 depth)
{
return (val & ~(bnxt_qplib_max_cmdq_idx_per_pg(depth))) /
(bnxt_qplib_cmdqe_cnt_per_pg(depth));
}
static inline u32 get_cmdq_idx(u32 val, u32 depth)
{
return val & (bnxt_qplib_max_cmdq_idx_per_pg(depth));
}
/* Crsq buf is 1024-Byte */
struct bnxt_qplib_crsbe {
u8 data[1024];
@ -133,76 +109,9 @@ struct bnxt_qplib_crsbe {
/* Allocate 1 per QP for async error notification for now */
#define BNXT_QPLIB_CREQE_MAX_CNT (64 * 1024)
#define BNXT_QPLIB_CREQE_UNITS 16 /* 16-Bytes per prod unit */
#define BNXT_QPLIB_CREQE_CNT_PER_PG (PAGE_SIZE / BNXT_QPLIB_CREQE_UNITS)
#define MAX_CREQ_IDX (BNXT_QPLIB_CREQE_MAX_CNT - 1)
#define MAX_CREQ_IDX_PER_PG (BNXT_QPLIB_CREQE_CNT_PER_PG - 1)
static inline u32 get_creq_pg(u32 val)
{
return (val & ~MAX_CREQ_IDX_PER_PG) / BNXT_QPLIB_CREQE_CNT_PER_PG;
}
static inline u32 get_creq_idx(u32 val)
{
return val & MAX_CREQ_IDX_PER_PG;
}
#define BNXT_QPLIB_CREQE_PER_PG (PAGE_SIZE / sizeof(struct creq_base))
#define CREQ_CMP_VALID(hdr, raw_cons, cp_bit) \
(!!((hdr)->v & CREQ_BASE_V) == \
!((raw_cons) & (cp_bit)))
#define CREQ_DB_KEY_CP (0x2 << CMPL_DOORBELL_KEY_SFT)
#define CREQ_DB_IDX_VALID CMPL_DOORBELL_IDX_VALID
#define CREQ_DB_IRQ_DIS CMPL_DOORBELL_MASK
#define CREQ_DB_CP_FLAGS_REARM (CREQ_DB_KEY_CP | \
CREQ_DB_IDX_VALID)
#define CREQ_DB_CP_FLAGS (CREQ_DB_KEY_CP | \
CREQ_DB_IDX_VALID | \
CREQ_DB_IRQ_DIS)
static inline void bnxt_qplib_ring_creq_db64(void __iomem *db, u32 index,
u32 xid, bool arm)
{
u64 val = 0;
val = xid & DBC_DBC_XID_MASK;
val |= DBC_DBC_PATH_ROCE;
val |= arm ? DBC_DBC_TYPE_NQ_ARM : DBC_DBC_TYPE_NQ;
val <<= 32;
val |= index & DBC_DBC_INDEX_MASK;
writeq(val, db);
}
static inline void bnxt_qplib_ring_creq_db_rearm(void __iomem *db, u32 raw_cons,
u32 max_elements, u32 xid,
bool gen_p5)
{
u32 index = raw_cons & (max_elements - 1);
if (gen_p5)
bnxt_qplib_ring_creq_db64(db, index, xid, true);
else
writel(CREQ_DB_CP_FLAGS_REARM | (index & DBC_DBC32_XID_MASK),
db);
}
static inline void bnxt_qplib_ring_creq_db(void __iomem *db, u32 raw_cons,
u32 max_elements, u32 xid,
bool gen_p5)
{
u32 index = raw_cons & (max_elements - 1);
if (gen_p5)
bnxt_qplib_ring_creq_db64(db, index, xid, true);
else
writel(CREQ_DB_CP_FLAGS | (index & DBC_DBC32_XID_MASK),
db);
}
#define CREQ_ENTRY_POLL_BUDGET 0x100
/* HWQ */

View File

@ -347,6 +347,7 @@ done:
hwq->depth = hwq_attr->depth;
hwq->max_elements = depth;
hwq->element_size = stride;
hwq->qe_ppg = pg_size / stride;
/* For direct access to the elements */
lvl = hwq->level;
if (hwq_attr->sginfo->nopte && hwq->level)

View File

@ -80,6 +80,15 @@ enum bnxt_qplib_pbl_lvl {
#define ROCE_PG_SIZE_8M (8 * 1024 * 1024)
#define ROCE_PG_SIZE_1G (1024 * 1024 * 1024)
enum bnxt_qplib_hwrm_pg_size {
BNXT_QPLIB_HWRM_PG_SIZE_4K = 0,
BNXT_QPLIB_HWRM_PG_SIZE_8K = 1,
BNXT_QPLIB_HWRM_PG_SIZE_64K = 2,
BNXT_QPLIB_HWRM_PG_SIZE_2M = 3,
BNXT_QPLIB_HWRM_PG_SIZE_8M = 4,
BNXT_QPLIB_HWRM_PG_SIZE_1G = 5,
};
struct bnxt_qplib_reg_desc {
u8 bar_id;
resource_size_t bar_base;
@ -126,6 +135,7 @@ struct bnxt_qplib_hwq {
u32 max_elements;
u32 depth;
u16 element_size; /* Size of each entry */
u16 qe_ppg; /* queue entry per page */
u32 prod; /* raw */
u32 cons; /* raw */
@ -263,6 +273,49 @@ static inline u8 bnxt_qplib_get_ring_type(struct bnxt_qplib_chip_ctx *cctx)
RING_ALLOC_REQ_RING_TYPE_ROCE_CMPL;
}
static inline u8 bnxt_qplib_base_pg_size(struct bnxt_qplib_hwq *hwq)
{
u8 pg_size = BNXT_QPLIB_HWRM_PG_SIZE_4K;
struct bnxt_qplib_pbl *pbl;
pbl = &hwq->pbl[PBL_LVL_0];
switch (pbl->pg_size) {
case ROCE_PG_SIZE_4K:
pg_size = BNXT_QPLIB_HWRM_PG_SIZE_4K;
break;
case ROCE_PG_SIZE_8K:
pg_size = BNXT_QPLIB_HWRM_PG_SIZE_8K;
break;
case ROCE_PG_SIZE_64K:
pg_size = BNXT_QPLIB_HWRM_PG_SIZE_64K;
break;
case ROCE_PG_SIZE_2M:
pg_size = BNXT_QPLIB_HWRM_PG_SIZE_2M;
break;
case ROCE_PG_SIZE_8M:
pg_size = BNXT_QPLIB_HWRM_PG_SIZE_8M;
break;
case ROCE_PG_SIZE_1G:
pg_size = BNXT_QPLIB_HWRM_PG_SIZE_1G;
break;
default:
break;
}
return pg_size;
}
static inline void *bnxt_qplib_get_qe(struct bnxt_qplib_hwq *hwq,
u32 indx, u64 *pg)
{
u32 pg_num, pg_idx;
pg_num = (indx / hwq->qe_ppg);
pg_idx = (indx % hwq->qe_ppg);
if (pg)
*pg = (u64)&hwq->pbl_ptr[pg_num];
return (void *)(hwq->pbl_ptr[pg_num] + hwq->element_size * pg_idx);
}
#define to_bnxt_qplib(ptr, type, member) \
container_of(ptr, type, member)

View File

@ -132,9 +132,6 @@ int bnxt_qplib_get_dev_attr(struct bnxt_qplib_rcfw *rcfw,
attr->max_raw_ethy_qp = le32_to_cpu(sb->max_raw_eth_qp);
attr->max_ah = le32_to_cpu(sb->max_ah);
attr->max_fmr = le32_to_cpu(sb->max_fmr);
attr->max_map_per_fmr = sb->max_map_per_fmr;
attr->max_srq = le16_to_cpu(sb->max_srq);
attr->max_srq_wqes = le32_to_cpu(sb->max_srq_wr) - 1;
attr->max_srq_sges = sb->max_srq_sge;

View File

@ -64,8 +64,6 @@ struct bnxt_qplib_dev_attr {
u32 max_mw;
u32 max_raw_ethy_qp;
u32 max_ah;
u32 max_fmr;
u32 max_map_per_fmr;
u32 max_srq;
u32 max_srq_wqes;
u32 max_srq_sges;

View File

@ -210,6 +210,20 @@ struct sq_send {
__le32 data[24];
};
/* sq_send_hdr (size:256b/32B) */
struct sq_send_hdr {
u8 wqe_type;
u8 flags;
u8 wqe_size;
u8 reserved8_1;
__le32 inv_key_or_imm_data;
__le32 length;
__le32 q_key;
__le32 dst_qp;
__le32 avid;
__le64 reserved64;
};
/* Send Raw Ethernet and QP1 SQ WQE (40 bytes) */
struct sq_send_raweth_qp1 {
u8 wqe_type;
@ -265,6 +279,21 @@ struct sq_send_raweth_qp1 {
__le32 data[24];
};
/* sq_send_raweth_qp1_hdr (size:256b/32B) */
struct sq_send_raweth_qp1_hdr {
u8 wqe_type;
u8 flags;
u8 wqe_size;
u8 reserved8;
__le16 lflags;
__le16 cfa_action;
__le32 length;
__le32 reserved32_1;
__le32 cfa_meta;
__le32 reserved32_2;
__le64 reserved64;
};
/* RDMA SQ WQE (40 bytes) */
struct sq_rdma {
u8 wqe_type;
@ -288,6 +317,20 @@ struct sq_rdma {
__le32 data[24];
};
/* sq_rdma_hdr (size:256b/32B) */
struct sq_rdma_hdr {
u8 wqe_type;
u8 flags;
u8 wqe_size;
u8 reserved8;
__le32 imm_data;
__le32 length;
__le32 reserved32_1;
__le64 remote_va;
__le32 remote_key;
__le32 reserved32_2;
};
/* Atomic SQ WQE (40 bytes) */
struct sq_atomic {
u8 wqe_type;
@ -307,6 +350,17 @@ struct sq_atomic {
__le32 data[24];
};
/* sq_atomic_hdr (size:256b/32B) */
struct sq_atomic_hdr {
u8 wqe_type;
u8 flags;
__le16 reserved16;
__le32 remote_key;
__le64 remote_va;
__le64 swap_data;
__le64 cmp_data;
};
/* Local Invalidate SQ WQE (40 bytes) */
struct sq_localinvalidate {
u8 wqe_type;
@ -324,6 +378,16 @@ struct sq_localinvalidate {
__le32 data[24];
};
/* sq_localinvalidate_hdr (size:256b/32B) */
struct sq_localinvalidate_hdr {
u8 wqe_type;
u8 flags;
__le16 reserved16;
__le32 inv_l_key;
__le64 reserved64;
u8 reserved128[16];
};
/* FR-PMR SQ WQE (40 bytes) */
struct sq_fr_pmr {
u8 wqe_type;
@ -380,6 +444,21 @@ struct sq_fr_pmr {
__le32 data[24];
};
/* sq_fr_pmr_hdr (size:256b/32B) */
struct sq_fr_pmr_hdr {
u8 wqe_type;
u8 flags;
u8 access_cntl;
u8 zero_based_page_size_log;
__le32 l_key;
u8 length[5];
u8 reserved8_1;
u8 reserved8_2;
u8 numlevels_pbl_page_size_log;
__le64 pblptr;
__le64 va;
};
/* Bind SQ WQE (40 bytes) */
struct sq_bind {
u8 wqe_type;
@ -417,6 +496,22 @@ struct sq_bind {
#define SQ_BIND_DATA_SFT 0
};
/* sq_bind_hdr (size:256b/32B) */
struct sq_bind_hdr {
u8 wqe_type;
u8 flags;
u8 access_cntl;
u8 reserved8_1;
u8 mw_type_zero_based;
u8 reserved8_2;
__le16 reserved16;
__le32 parent_l_key;
__le32 l_key;
__le64 va;
u8 length[5];
u8 reserved24[3];
};
/* RQ/SRQ WQE Structures */
/* RQ/SRQ WQE (40 bytes) */
struct rq_wqe {
@ -435,6 +530,17 @@ struct rq_wqe {
__le32 data[24];
};
/* rq_wqe_hdr (size:256b/32B) */
struct rq_wqe_hdr {
u8 wqe_type;
u8 flags;
u8 wqe_size;
u8 reserved8;
__le32 reserved32;
__le32 wr_id[2];
u8 reserved128[16];
};
/* CQ CQE Structures */
/* Base CQE (32 bytes) */
struct cq_base {

View File

@ -953,6 +953,7 @@ void c4iw_dealloc(struct uld_ctx *ctx)
static void c4iw_remove(struct uld_ctx *ctx)
{
pr_debug("c4iw_dev %p\n", ctx->dev);
debugfs_remove_recursive(ctx->dev->debugfs_root);
c4iw_unregister_device(ctx->dev);
c4iw_dealloc(ctx);
}

View File

@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
/*
* Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
* Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#ifndef _EFA_H_
@ -40,6 +40,7 @@ struct efa_sw_stats {
atomic64_t reg_mr_err;
atomic64_t alloc_ucontext_err;
atomic64_t create_ah_err;
atomic64_t mmap_err;
};
/* Don't use anything other than atomic64 */
@ -153,8 +154,7 @@ int efa_mmap(struct ib_ucontext *ibucontext,
struct vm_area_struct *vma);
void efa_mmap_free(struct rdma_user_mmap_entry *rdma_entry);
int efa_create_ah(struct ib_ah *ibah,
struct rdma_ah_attr *ah_attr,
u32 flags,
struct rdma_ah_init_attr *init_attr,
struct ib_udata *udata);
void efa_destroy_ah(struct ib_ah *ibah, u32 flags);
int efa_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,

View File

@ -37,7 +37,7 @@ enum efa_admin_aq_feature_id {
EFA_ADMIN_NETWORK_ATTR = 3,
EFA_ADMIN_QUEUE_ATTR = 4,
EFA_ADMIN_HW_HINTS = 5,
EFA_ADMIN_FEATURES_OPCODE_NUM = 8,
EFA_ADMIN_HOST_INFO = 6,
};
/* QP transport type */
@ -799,6 +799,54 @@ struct efa_admin_mmio_req_read_less_resp {
u32 reg_val;
};
enum efa_admin_os_type {
EFA_ADMIN_OS_LINUX = 0,
};
struct efa_admin_host_info {
/* OS distribution string format */
u8 os_dist_str[128];
/* Defined in enum efa_admin_os_type */
u32 os_type;
/* Kernel version string format */
u8 kernel_ver_str[32];
/* Kernel version numeric format */
u32 kernel_ver;
/*
* 7:0 : driver_module_type
* 15:8 : driver_sub_minor
* 23:16 : driver_minor
* 31:24 : driver_major
*/
u32 driver_ver;
/*
* Device's Bus, Device and Function
* 2:0 : function
* 7:3 : device
* 15:8 : bus
*/
u16 bdf;
/*
* Spec version
* 7:0 : spec_minor
* 15:8 : spec_major
*/
u16 spec_ver;
/*
* 0 : intree - Intree driver
* 1 : gdr - GPUDirect RDMA supported
* 31:2 : reserved2
*/
u32 flags;
};
/* create_qp_cmd */
#define EFA_ADMIN_CREATE_QP_CMD_SQ_VIRT_MASK BIT(0)
#define EFA_ADMIN_CREATE_QP_CMD_RQ_VIRT_MASK BIT(1)
@ -820,4 +868,17 @@ struct efa_admin_mmio_req_read_less_resp {
/* feature_device_attr_desc */
#define EFA_ADMIN_FEATURE_DEVICE_ATTR_DESC_RDMA_READ_MASK BIT(0)
/* host_info */
#define EFA_ADMIN_HOST_INFO_DRIVER_MODULE_TYPE_MASK GENMASK(7, 0)
#define EFA_ADMIN_HOST_INFO_DRIVER_SUB_MINOR_MASK GENMASK(15, 8)
#define EFA_ADMIN_HOST_INFO_DRIVER_MINOR_MASK GENMASK(23, 16)
#define EFA_ADMIN_HOST_INFO_DRIVER_MAJOR_MASK GENMASK(31, 24)
#define EFA_ADMIN_HOST_INFO_FUNCTION_MASK GENMASK(2, 0)
#define EFA_ADMIN_HOST_INFO_DEVICE_MASK GENMASK(7, 3)
#define EFA_ADMIN_HOST_INFO_BUS_MASK GENMASK(15, 8)
#define EFA_ADMIN_HOST_INFO_SPEC_MINOR_MASK GENMASK(7, 0)
#define EFA_ADMIN_HOST_INFO_SPEC_MAJOR_MASK GENMASK(15, 8)
#define EFA_ADMIN_HOST_INFO_INTREE_MASK BIT(0)
#define EFA_ADMIN_HOST_INFO_GDR_MASK BIT(1)
#endif /* _EFA_ADMIN_CMDS_H_ */

View File

@ -631,17 +631,20 @@ int efa_com_cmd_exec(struct efa_com_admin_queue *aq,
cmd->aq_common_descriptor.opcode, PTR_ERR(comp_ctx));
up(&aq->avail_cmds);
atomic64_inc(&aq->stats.cmd_err);
return PTR_ERR(comp_ctx);
}
err = efa_com_wait_and_process_admin_cq(comp_ctx, aq);
if (err)
if (err) {
ibdev_err_ratelimited(
aq->efa_dev,
"Failed to process command %s (opcode %u) comp_status %d err %d\n",
efa_com_cmd_str(cmd->aq_common_descriptor.opcode),
cmd->aq_common_descriptor.opcode, comp_ctx->comp_status,
err);
atomic64_inc(&aq->stats.cmd_err);
}
up(&aq->avail_cmds);

View File

@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
/*
* Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
* Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#ifndef _EFA_COM_H_
@ -47,6 +47,7 @@ struct efa_com_admin_sq {
struct efa_com_stats_admin {
atomic64_t submitted_cmd;
atomic64_t completed_cmd;
atomic64_t cmd_err;
atomic64_t no_completion;
};

View File

@ -351,7 +351,7 @@ int efa_com_destroy_ah(struct efa_com_dev *edev,
return 0;
}
static bool
bool
efa_com_check_supported_feature_id(struct efa_com_dev *edev,
enum efa_admin_aq_feature_id feature_id)
{
@ -388,7 +388,7 @@ static int efa_com_get_feature_ex(struct efa_com_dev *edev,
if (control_buff_size)
EFA_SET(&get_cmd.aq_common_descriptor.flags,
EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_INDIRECT, 1);
EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA, 1);
efa_com_set_dma_addr(control_buf_dma_addr,
&get_cmd.control_buffer.address.mem_addr_high,
@ -517,12 +517,12 @@ int efa_com_get_hw_hints(struct efa_com_dev *edev,
return 0;
}
static int efa_com_set_feature_ex(struct efa_com_dev *edev,
struct efa_admin_set_feature_resp *set_resp,
struct efa_admin_set_feature_cmd *set_cmd,
enum efa_admin_aq_feature_id feature_id,
dma_addr_t control_buf_dma_addr,
u32 control_buff_size)
int efa_com_set_feature_ex(struct efa_com_dev *edev,
struct efa_admin_set_feature_resp *set_resp,
struct efa_admin_set_feature_cmd *set_cmd,
enum efa_admin_aq_feature_id feature_id,
dma_addr_t control_buf_dma_addr,
u32 control_buff_size)
{
struct efa_com_admin_queue *aq;
int err;
@ -540,7 +540,7 @@ static int efa_com_set_feature_ex(struct efa_com_dev *edev,
if (control_buff_size) {
set_cmd->aq_common_descriptor.flags = 0;
EFA_SET(&set_cmd->aq_common_descriptor.flags,
EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA_INDIRECT, 1);
EFA_ADMIN_AQ_COMMON_DESC_CTRL_DATA, 1);
efa_com_set_dma_addr(control_buf_dma_addr,
&set_cmd->control_buffer.address.mem_addr_high,
&set_cmd->control_buffer.address.mem_addr_low);

View File

@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
/*
* Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
* Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#ifndef _EFA_COM_CMD_H_
@ -270,6 +270,15 @@ int efa_com_get_device_attr(struct efa_com_dev *edev,
struct efa_com_get_device_attr_result *result);
int efa_com_get_hw_hints(struct efa_com_dev *edev,
struct efa_com_get_hw_hints_result *result);
bool
efa_com_check_supported_feature_id(struct efa_com_dev *edev,
enum efa_admin_aq_feature_id feature_id);
int efa_com_set_feature_ex(struct efa_com_dev *edev,
struct efa_admin_set_feature_resp *set_resp,
struct efa_admin_set_feature_cmd *set_cmd,
enum efa_admin_aq_feature_id feature_id,
dma_addr_t control_buf_dma_addr,
u32 control_buff_size);
int efa_com_set_aenq_config(struct efa_com_dev *edev, u32 groups);
int efa_com_alloc_pd(struct efa_com_dev *edev,
struct efa_com_alloc_pd_result *result);

View File

@ -1,10 +1,12 @@
// SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
/*
* Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All rights reserved.
* Copyright 2018-2020 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#include <linux/module.h>
#include <linux/pci.h>
#include <linux/utsname.h>
#include <linux/version.h>
#include <rdma/ib_user_verbs.h>
@ -187,6 +189,52 @@ static void efa_stats_init(struct efa_dev *dev)
atomic64_set(s, 0);
}
static void efa_set_host_info(struct efa_dev *dev)
{
struct efa_admin_set_feature_resp resp = {};
struct efa_admin_set_feature_cmd cmd = {};
struct efa_admin_host_info *hinf;
u32 bufsz = sizeof(*hinf);
dma_addr_t hinf_dma;
if (!efa_com_check_supported_feature_id(&dev->edev,
EFA_ADMIN_HOST_INFO))
return;
/* Failures in host info set shall not disturb probe */
hinf = dma_alloc_coherent(&dev->pdev->dev, bufsz, &hinf_dma,
GFP_KERNEL);
if (!hinf)
return;
strlcpy(hinf->os_dist_str, utsname()->release,
min(sizeof(hinf->os_dist_str), sizeof(utsname()->release)));
hinf->os_type = EFA_ADMIN_OS_LINUX;
strlcpy(hinf->kernel_ver_str, utsname()->version,
min(sizeof(hinf->kernel_ver_str), sizeof(utsname()->version)));
hinf->kernel_ver = LINUX_VERSION_CODE;
EFA_SET(&hinf->driver_ver, EFA_ADMIN_HOST_INFO_DRIVER_MAJOR, 0);
EFA_SET(&hinf->driver_ver, EFA_ADMIN_HOST_INFO_DRIVER_MINOR, 0);
EFA_SET(&hinf->driver_ver, EFA_ADMIN_HOST_INFO_DRIVER_SUB_MINOR, 0);
EFA_SET(&hinf->driver_ver, EFA_ADMIN_HOST_INFO_DRIVER_MODULE_TYPE, 0);
EFA_SET(&hinf->bdf, EFA_ADMIN_HOST_INFO_BUS, dev->pdev->bus->number);
EFA_SET(&hinf->bdf, EFA_ADMIN_HOST_INFO_DEVICE,
PCI_SLOT(dev->pdev->devfn));
EFA_SET(&hinf->bdf, EFA_ADMIN_HOST_INFO_FUNCTION,
PCI_FUNC(dev->pdev->devfn));
EFA_SET(&hinf->spec_ver, EFA_ADMIN_HOST_INFO_SPEC_MAJOR,
EFA_COMMON_SPEC_VERSION_MAJOR);
EFA_SET(&hinf->spec_ver, EFA_ADMIN_HOST_INFO_SPEC_MINOR,
EFA_COMMON_SPEC_VERSION_MINOR);
EFA_SET(&hinf->flags, EFA_ADMIN_HOST_INFO_INTREE, 1);
EFA_SET(&hinf->flags, EFA_ADMIN_HOST_INFO_GDR, 0);
efa_com_set_feature_ex(&dev->edev, &resp, &cmd, EFA_ADMIN_HOST_INFO,
hinf_dma, bufsz);
dma_free_coherent(&dev->pdev->dev, bufsz, hinf, hinf_dma);
}
static const struct ib_device_ops efa_dev_ops = {
.owner = THIS_MODULE,
.driver_id = RDMA_DRIVER_EFA,
@ -251,6 +299,8 @@ static int efa_ib_device_add(struct efa_dev *dev)
if (err)
goto err_release_doorbell_bar;
efa_set_host_info(dev);
dev->ibdev.node_type = RDMA_NODE_UNSPECIFIED;
dev->ibdev.phys_port_cnt = 1;
dev->ibdev.num_comp_vectors = 1;

View File

@ -37,13 +37,16 @@ struct efa_user_mmap_entry {
op(EFA_RX_DROPS, "rx_drops") \
op(EFA_SUBMITTED_CMDS, "submitted_cmds") \
op(EFA_COMPLETED_CMDS, "completed_cmds") \
op(EFA_CMDS_ERR, "cmds_err") \
op(EFA_NO_COMPLETION_CMDS, "no_completion_cmds") \
op(EFA_KEEP_ALIVE_RCVD, "keep_alive_rcvd") \
op(EFA_ALLOC_PD_ERR, "alloc_pd_err") \
op(EFA_CREATE_QP_ERR, "create_qp_err") \
op(EFA_CREATE_CQ_ERR, "create_cq_err") \
op(EFA_REG_MR_ERR, "reg_mr_err") \
op(EFA_ALLOC_UCONTEXT_ERR, "alloc_ucontext_err") \
op(EFA_CREATE_AH_ERR, "create_ah_err")
op(EFA_CREATE_AH_ERR, "create_ah_err") \
op(EFA_MMAP_ERR, "mmap_err")
#define EFA_STATS_ENUM(ename, name) ename,
#define EFA_STATS_STR(ename, name) [ename] = name,
@ -1568,6 +1571,7 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
ibdev_dbg(&dev->ibdev,
"pgoff[%#lx] does not have valid entry\n",
vma->vm_pgoff);
atomic64_inc(&dev->stats.sw_stats.mmap_err);
return -EINVAL;
}
entry = to_emmap(rdma_entry);
@ -1603,12 +1607,14 @@ static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
err = -EINVAL;
}
if (err)
if (err) {
ibdev_dbg(
&dev->ibdev,
"Couldn't mmap address[%#llx] length[%#zx] mmap_flag[%d] err[%d]\n",
entry->address, rdma_entry->npages * PAGE_SIZE,
entry->mmap_flag, err);
atomic64_inc(&dev->stats.sw_stats.mmap_err);
}
rdma_user_mmap_entry_put(rdma_entry);
return err;
@ -1639,10 +1645,10 @@ static int efa_ah_destroy(struct efa_dev *dev, struct efa_ah *ah)
}
int efa_create_ah(struct ib_ah *ibah,
struct rdma_ah_attr *ah_attr,
u32 flags,
struct rdma_ah_init_attr *init_attr,
struct ib_udata *udata)
{
struct rdma_ah_attr *ah_attr = init_attr->ah_attr;
struct efa_dev *dev = to_edev(ibah->device);
struct efa_com_create_ah_params params = {};
struct efa_ibv_create_ah_resp resp = {};
@ -1650,7 +1656,7 @@ int efa_create_ah(struct ib_ah *ibah,
struct efa_ah *ah = to_eah(ibah);
int err;
if (!(flags & RDMA_CREATE_AH_SLEEPABLE)) {
if (!(init_attr->flags & RDMA_CREATE_AH_SLEEPABLE)) {
ibdev_dbg(&dev->ibdev,
"Create address handle is not supported in atomic context\n");
err = -EOPNOTSUPP;
@ -1747,15 +1753,18 @@ int efa_get_hw_stats(struct ib_device *ibdev, struct rdma_hw_stats *stats,
as = &dev->edev.aq.stats;
stats->value[EFA_SUBMITTED_CMDS] = atomic64_read(&as->submitted_cmd);
stats->value[EFA_COMPLETED_CMDS] = atomic64_read(&as->completed_cmd);
stats->value[EFA_CMDS_ERR] = atomic64_read(&as->cmd_err);
stats->value[EFA_NO_COMPLETION_CMDS] = atomic64_read(&as->no_completion);
s = &dev->stats;
stats->value[EFA_KEEP_ALIVE_RCVD] = atomic64_read(&s->keep_alive_rcvd);
stats->value[EFA_ALLOC_PD_ERR] = atomic64_read(&s->sw_stats.alloc_pd_err);
stats->value[EFA_CREATE_QP_ERR] = atomic64_read(&s->sw_stats.create_qp_err);
stats->value[EFA_CREATE_CQ_ERR] = atomic64_read(&s->sw_stats.create_cq_err);
stats->value[EFA_REG_MR_ERR] = atomic64_read(&s->sw_stats.reg_mr_err);
stats->value[EFA_ALLOC_UCONTEXT_ERR] = atomic64_read(&s->sw_stats.alloc_ucontext_err);
stats->value[EFA_CREATE_AH_ERR] = atomic64_read(&s->sw_stats.create_ah_err);
stats->value[EFA_MMAP_ERR] = atomic64_read(&s->sw_stats.mmap_err);
return ARRAY_SIZE(efa_stats_names);
}

View File

@ -22,9 +22,13 @@ hfi1-y := \
init.o \
intr.o \
iowait.o \
ipoib_main.o \
ipoib_rx.o \
ipoib_tx.o \
mad.o \
mmu_rb.o \
msix.o \
netdev_rx.o \
opfn.o \
pcie.o \
pio.o \

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015 - 2018 Intel Corporation.
* Copyright(c) 2015 - 2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -64,6 +64,7 @@ struct hfi1_affinity_node_list node_affinity = {
static const char * const irq_type_names[] = {
"SDMA",
"RCVCTXT",
"NETDEVCTXT",
"GENERAL",
"OTHER",
};
@ -915,6 +916,11 @@ static int get_irq_affinity(struct hfi1_devdata *dd,
set = &entry->rcv_intr;
scnprintf(extra, 64, "ctxt %u", rcd->ctxt);
break;
case IRQ_NETDEVCTXT:
rcd = (struct hfi1_ctxtdata *)msix->arg;
set = &entry->def_intr;
scnprintf(extra, 64, "ctxt %u", rcd->ctxt);
break;
default:
dd_dev_err(dd, "Invalid IRQ type %d\n", msix->type);
return -EINVAL;
@ -987,6 +993,10 @@ void hfi1_put_irq_affinity(struct hfi1_devdata *dd,
if (rcd->ctxt != HFI1_CTRL_CTXT)
set = &entry->rcv_intr;
break;
case IRQ_NETDEVCTXT:
rcd = (struct hfi1_ctxtdata *)msix->arg;
set = &entry->def_intr;
break;
default:
mutex_unlock(&node_affinity.lock);
return;

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015 - 2018 Intel Corporation.
* Copyright(c) 2015 - 2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -52,6 +52,7 @@
enum irq_type {
IRQ_SDMA,
IRQ_RCVCTXT,
IRQ_NETDEVCTXT,
IRQ_GENERAL,
IRQ_OTHER
};

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015 - 2018 Intel Corporation.
* Copyright(c) 2015 - 2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -66,10 +66,7 @@
#include "affinity.h"
#include "debugfs.h"
#include "fault.h"
uint kdeth_qp;
module_param_named(kdeth_qp, kdeth_qp, uint, S_IRUGO);
MODULE_PARM_DESC(kdeth_qp, "Set the KDETH queue pair prefix");
#include "netdev.h"
uint num_vls = HFI1_MAX_VLS_SUPPORTED;
module_param(num_vls, uint, S_IRUGO);
@ -128,13 +125,15 @@ struct flag_table {
/*
* RSM instance allocation
* 0 - Verbs
* 1 - User Fecn Handling
* 2 - Vnic
* 0 - User Fecn Handling
* 1 - Vnic
* 2 - AIP
* 3 - Verbs
*/
#define RSM_INS_VERBS 0
#define RSM_INS_FECN 1
#define RSM_INS_VNIC 2
#define RSM_INS_FECN 0
#define RSM_INS_VNIC 1
#define RSM_INS_AIP 2
#define RSM_INS_VERBS 3
/* Bit offset into the GUID which carries HFI id information */
#define GUID_HFI_INDEX_SHIFT 39
@ -175,6 +174,25 @@ struct flag_table {
/* QPN[m+n:1] QW 1, OFFSET 1 */
#define QPN_SELECT_OFFSET ((1ull << QW_SHIFT) | (1ull))
/* RSM fields for AIP */
/* LRH.BTH above is reused for this rule */
/* BTH.DESTQP: QW 1, OFFSET 16 for match */
#define BTH_DESTQP_QW 1ull
#define BTH_DESTQP_BIT_OFFSET 16ull
#define BTH_DESTQP_OFFSET(off) ((BTH_DESTQP_QW << QW_SHIFT) | (off))
#define BTH_DESTQP_MATCH_OFFSET BTH_DESTQP_OFFSET(BTH_DESTQP_BIT_OFFSET)
#define BTH_DESTQP_MASK 0xFFull
#define BTH_DESTQP_VALUE 0x81ull
/* DETH.SQPN: QW 1 Offset 56 for select */
/* We use 8 most significant Soure QPN bits as entropy fpr AIP */
#define DETH_AIP_SQPN_QW 3ull
#define DETH_AIP_SQPN_BIT_OFFSET 56ull
#define DETH_AIP_SQPN_OFFSET(off) ((DETH_AIP_SQPN_QW << QW_SHIFT) | (off))
#define DETH_AIP_SQPN_SELECT_OFFSET \
DETH_AIP_SQPN_OFFSET(DETH_AIP_SQPN_BIT_OFFSET)
/* RSM fields for Vnic */
/* L2_TYPE: QW 0, OFFSET 61 - for match */
#define L2_TYPE_QW 0ull
@ -8463,6 +8481,49 @@ static void hfi1_rcd_eoi_intr(struct hfi1_ctxtdata *rcd)
local_irq_restore(flags);
}
/**
* hfi1_netdev_rx_napi - napi poll function to move eoi inline
* @napi - pointer to napi object
* @budget - netdev budget
*/
int hfi1_netdev_rx_napi(struct napi_struct *napi, int budget)
{
struct hfi1_netdev_rxq *rxq = container_of(napi,
struct hfi1_netdev_rxq, napi);
struct hfi1_ctxtdata *rcd = rxq->rcd;
int work_done = 0;
work_done = rcd->do_interrupt(rcd, budget);
if (work_done < budget) {
napi_complete_done(napi, work_done);
hfi1_rcd_eoi_intr(rcd);
}
return work_done;
}
/* Receive packet napi handler for netdevs VNIC and AIP */
irqreturn_t receive_context_interrupt_napi(int irq, void *data)
{
struct hfi1_ctxtdata *rcd = data;
receive_interrupt_common(rcd);
if (likely(rcd->napi)) {
if (likely(napi_schedule_prep(rcd->napi)))
__napi_schedule_irqoff(rcd->napi);
else
__hfi1_rcd_eoi_intr(rcd);
} else {
WARN_ONCE(1, "Napi IRQ handler without napi set up ctxt=%d\n",
rcd->ctxt);
__hfi1_rcd_eoi_intr(rcd);
}
return IRQ_HANDLED;
}
/*
* Receive packet IRQ handler. This routine expects to be on its own IRQ.
* This routine will try to handle packets immediately (latency), but if
@ -13330,13 +13391,12 @@ static int set_up_interrupts(struct hfi1_devdata *dd)
* in array of contexts
* freectxts - number of free user contexts
* num_send_contexts - number of PIO send contexts being used
* num_vnic_contexts - number of contexts reserved for VNIC
* num_netdev_contexts - number of contexts reserved for netdev
*/
static int set_up_context_variables(struct hfi1_devdata *dd)
{
unsigned long num_kernel_contexts;
u16 num_vnic_contexts = HFI1_NUM_VNIC_CTXT;
int total_contexts;
u16 num_netdev_contexts;
int ret;
unsigned ngroups;
int rmt_count;
@ -13373,13 +13433,6 @@ static int set_up_context_variables(struct hfi1_devdata *dd)
num_kernel_contexts = send_contexts - num_vls - 1;
}
/* Accommodate VNIC contexts if possible */
if ((num_kernel_contexts + num_vnic_contexts) > rcv_contexts) {
dd_dev_err(dd, "No receive contexts available for VNIC\n");
num_vnic_contexts = 0;
}
total_contexts = num_kernel_contexts + num_vnic_contexts;
/*
* User contexts:
* - default to 1 user context per real (non-HT) CPU core if
@ -13392,28 +13445,32 @@ static int set_up_context_variables(struct hfi1_devdata *dd)
/*
* Adjust the counts given a global max.
*/
if (total_contexts + n_usr_ctxts > rcv_contexts) {
if (num_kernel_contexts + n_usr_ctxts > rcv_contexts) {
dd_dev_err(dd,
"Reducing # user receive contexts to: %d, from %u\n",
rcv_contexts - total_contexts,
"Reducing # user receive contexts to: %u, from %u\n",
(u32)(rcv_contexts - num_kernel_contexts),
n_usr_ctxts);
/* recalculate */
n_usr_ctxts = rcv_contexts - total_contexts;
n_usr_ctxts = rcv_contexts - num_kernel_contexts;
}
num_netdev_contexts =
hfi1_num_netdev_contexts(dd, rcv_contexts -
(num_kernel_contexts + n_usr_ctxts),
&node_affinity.real_cpu_mask);
/*
* The RMT entries are currently allocated as shown below:
* 1. QOS (0 to 128 entries);
* 2. FECN (num_kernel_context - 1 + num_user_contexts +
* num_vnic_contexts);
* 3. VNIC (num_vnic_contexts).
* It should be noted that FECN oversubscribe num_vnic_contexts
* entries of RMT because both VNIC and PSM could allocate any receive
* num_netdev_contexts);
* 3. netdev (num_netdev_contexts).
* It should be noted that FECN oversubscribe num_netdev_contexts
* entries of RMT because both netdev and PSM could allocate any receive
* context between dd->first_dyn_alloc_text and dd->num_rcv_contexts,
* and PSM FECN must reserve an RMT entry for each possible PSM receive
* context.
*/
rmt_count = qos_rmt_entries(dd, NULL, NULL) + (num_vnic_contexts * 2);
rmt_count = qos_rmt_entries(dd, NULL, NULL) + (num_netdev_contexts * 2);
if (HFI1_CAP_IS_KSET(TID_RDMA))
rmt_count += num_kernel_contexts - 1;
if (rmt_count + n_usr_ctxts > NUM_MAP_ENTRIES) {
@ -13426,21 +13483,20 @@ static int set_up_context_variables(struct hfi1_devdata *dd)
n_usr_ctxts = user_rmt_reduced;
}
total_contexts += n_usr_ctxts;
/* the first N are kernel contexts, the rest are user/vnic contexts */
dd->num_rcv_contexts = total_contexts;
/* the first N are kernel contexts, the rest are user/netdev contexts */
dd->num_rcv_contexts =
num_kernel_contexts + n_usr_ctxts + num_netdev_contexts;
dd->n_krcv_queues = num_kernel_contexts;
dd->first_dyn_alloc_ctxt = num_kernel_contexts;
dd->num_vnic_contexts = num_vnic_contexts;
dd->num_netdev_contexts = num_netdev_contexts;
dd->num_user_contexts = n_usr_ctxts;
dd->freectxts = n_usr_ctxts;
dd_dev_info(dd,
"rcv contexts: chip %d, used %d (kernel %d, vnic %u, user %u)\n",
"rcv contexts: chip %d, used %d (kernel %d, netdev %u, user %u)\n",
rcv_contexts,
(int)dd->num_rcv_contexts,
(int)dd->n_krcv_queues,
dd->num_vnic_contexts,
dd->num_netdev_contexts,
dd->num_user_contexts);
/*
@ -14119,21 +14175,12 @@ static void init_early_variables(struct hfi1_devdata *dd)
static void init_kdeth_qp(struct hfi1_devdata *dd)
{
/* user changed the KDETH_QP */
if (kdeth_qp != 0 && kdeth_qp >= 0xff) {
/* out of range or illegal value */
dd_dev_err(dd, "Invalid KDETH queue pair prefix, ignoring");
kdeth_qp = 0;
}
if (kdeth_qp == 0) /* not set, or failed range check */
kdeth_qp = DEFAULT_KDETH_QP;
write_csr(dd, SEND_BTH_QP,
(kdeth_qp & SEND_BTH_QP_KDETH_QP_MASK) <<
(RVT_KDETH_QP_PREFIX & SEND_BTH_QP_KDETH_QP_MASK) <<
SEND_BTH_QP_KDETH_QP_SHIFT);
write_csr(dd, RCV_BTH_QP,
(kdeth_qp & RCV_BTH_QP_KDETH_QP_MASK) <<
(RVT_KDETH_QP_PREFIX & RCV_BTH_QP_KDETH_QP_MASK) <<
RCV_BTH_QP_KDETH_QP_SHIFT);
}
@ -14249,6 +14296,12 @@ static void complete_rsm_map_table(struct hfi1_devdata *dd,
}
}
/* Is a receive side mapping rule */
static bool has_rsm_rule(struct hfi1_devdata *dd, u8 rule_index)
{
return read_csr(dd, RCV_RSM_CFG + (8 * rule_index)) != 0;
}
/*
* Add a receive side mapping rule.
*/
@ -14485,77 +14538,138 @@ static void init_fecn_handling(struct hfi1_devdata *dd,
rmt->used += total_cnt;
}
/* Initialize RSM for VNIC */
void hfi1_init_vnic_rsm(struct hfi1_devdata *dd)
static inline bool hfi1_is_rmt_full(int start, int spare)
{
return (start + spare) > NUM_MAP_ENTRIES;
}
static bool hfi1_netdev_update_rmt(struct hfi1_devdata *dd)
{
u8 i, j;
u8 ctx_id = 0;
u64 reg;
u32 regoff;
struct rsm_rule_data rrd;
int rmt_start = hfi1_netdev_get_free_rmt_idx(dd);
int ctxt_count = hfi1_netdev_ctxt_count(dd);
if (hfi1_vnic_is_rsm_full(dd, NUM_VNIC_MAP_ENTRIES)) {
dd_dev_err(dd, "Vnic RSM disabled, rmt entries used = %d\n",
dd->vnic.rmt_start);
return;
/* We already have contexts mapped in RMT */
if (has_rsm_rule(dd, RSM_INS_VNIC) || has_rsm_rule(dd, RSM_INS_AIP)) {
dd_dev_info(dd, "Contexts are already mapped in RMT\n");
return true;
}
dev_dbg(&(dd)->pcidev->dev, "Vnic rsm start = %d, end %d\n",
dd->vnic.rmt_start,
dd->vnic.rmt_start + NUM_VNIC_MAP_ENTRIES);
if (hfi1_is_rmt_full(rmt_start, NUM_NETDEV_MAP_ENTRIES)) {
dd_dev_err(dd, "Not enough RMT entries used = %d\n",
rmt_start);
return false;
}
dev_dbg(&(dd)->pcidev->dev, "RMT start = %d, end %d\n",
rmt_start,
rmt_start + NUM_NETDEV_MAP_ENTRIES);
/* Update RSM mapping table, 32 regs, 256 entries - 1 ctx per byte */
regoff = RCV_RSM_MAP_TABLE + (dd->vnic.rmt_start / 8) * 8;
regoff = RCV_RSM_MAP_TABLE + (rmt_start / 8) * 8;
reg = read_csr(dd, regoff);
for (i = 0; i < NUM_VNIC_MAP_ENTRIES; i++) {
/* Update map register with vnic context */
j = (dd->vnic.rmt_start + i) % 8;
for (i = 0; i < NUM_NETDEV_MAP_ENTRIES; i++) {
/* Update map register with netdev context */
j = (rmt_start + i) % 8;
reg &= ~(0xffllu << (j * 8));
reg |= (u64)dd->vnic.ctxt[ctx_id++]->ctxt << (j * 8);
/* Wrap up vnic ctx index */
ctx_id %= dd->vnic.num_ctxt;
reg |= (u64)hfi1_netdev_get_ctxt(dd, ctx_id++)->ctxt << (j * 8);
/* Wrap up netdev ctx index */
ctx_id %= ctxt_count;
/* Write back map register */
if (j == 7 || ((i + 1) == NUM_VNIC_MAP_ENTRIES)) {
if (j == 7 || ((i + 1) == NUM_NETDEV_MAP_ENTRIES)) {
dev_dbg(&(dd)->pcidev->dev,
"Vnic rsm map reg[%d] =0x%llx\n",
"RMT[%d] =0x%llx\n",
regoff - RCV_RSM_MAP_TABLE, reg);
write_csr(dd, regoff, reg);
regoff += 8;
if (i < (NUM_VNIC_MAP_ENTRIES - 1))
if (i < (NUM_NETDEV_MAP_ENTRIES - 1))
reg = read_csr(dd, regoff);
}
}
/* Add rule for vnic */
rrd.offset = dd->vnic.rmt_start;
rrd.pkt_type = 4;
/* Match 16B packets */
rrd.field1_off = L2_TYPE_MATCH_OFFSET;
rrd.mask1 = L2_TYPE_MASK;
rrd.value1 = L2_16B_VALUE;
/* Match ETH L4 packets */
rrd.field2_off = L4_TYPE_MATCH_OFFSET;
rrd.mask2 = L4_16B_TYPE_MASK;
rrd.value2 = L4_16B_ETH_VALUE;
/* Calc context from veswid and entropy */
rrd.index1_off = L4_16B_HDR_VESWID_OFFSET;
rrd.index1_width = ilog2(NUM_VNIC_MAP_ENTRIES);
rrd.index2_off = L2_16B_ENTROPY_OFFSET;
rrd.index2_width = ilog2(NUM_VNIC_MAP_ENTRIES);
add_rsm_rule(dd, RSM_INS_VNIC, &rrd);
return true;
}
/* Enable RSM if not already enabled */
static void hfi1_enable_rsm_rule(struct hfi1_devdata *dd,
int rule, struct rsm_rule_data *rrd)
{
if (!hfi1_netdev_update_rmt(dd)) {
dd_dev_err(dd, "Failed to update RMT for RSM%d rule\n", rule);
return;
}
add_rsm_rule(dd, rule, rrd);
add_rcvctrl(dd, RCV_CTRL_RCV_RSM_ENABLE_SMASK);
}
void hfi1_init_aip_rsm(struct hfi1_devdata *dd)
{
/*
* go through with the initialisation only if this rule actually doesn't
* exist yet
*/
if (atomic_fetch_inc(&dd->ipoib_rsm_usr_num) == 0) {
int rmt_start = hfi1_netdev_get_free_rmt_idx(dd);
struct rsm_rule_data rrd = {
.offset = rmt_start,
.pkt_type = IB_PACKET_TYPE,
.field1_off = LRH_BTH_MATCH_OFFSET,
.mask1 = LRH_BTH_MASK,
.value1 = LRH_BTH_VALUE,
.field2_off = BTH_DESTQP_MATCH_OFFSET,
.mask2 = BTH_DESTQP_MASK,
.value2 = BTH_DESTQP_VALUE,
.index1_off = DETH_AIP_SQPN_SELECT_OFFSET +
ilog2(NUM_NETDEV_MAP_ENTRIES),
.index1_width = ilog2(NUM_NETDEV_MAP_ENTRIES),
.index2_off = DETH_AIP_SQPN_SELECT_OFFSET,
.index2_width = ilog2(NUM_NETDEV_MAP_ENTRIES)
};
hfi1_enable_rsm_rule(dd, RSM_INS_AIP, &rrd);
}
}
/* Initialize RSM for VNIC */
void hfi1_init_vnic_rsm(struct hfi1_devdata *dd)
{
int rmt_start = hfi1_netdev_get_free_rmt_idx(dd);
struct rsm_rule_data rrd = {
/* Add rule for vnic */
.offset = rmt_start,
.pkt_type = 4,
/* Match 16B packets */
.field1_off = L2_TYPE_MATCH_OFFSET,
.mask1 = L2_TYPE_MASK,
.value1 = L2_16B_VALUE,
/* Match ETH L4 packets */
.field2_off = L4_TYPE_MATCH_OFFSET,
.mask2 = L4_16B_TYPE_MASK,
.value2 = L4_16B_ETH_VALUE,
/* Calc context from veswid and entropy */
.index1_off = L4_16B_HDR_VESWID_OFFSET,
.index1_width = ilog2(NUM_NETDEV_MAP_ENTRIES),
.index2_off = L2_16B_ENTROPY_OFFSET,
.index2_width = ilog2(NUM_NETDEV_MAP_ENTRIES)
};
hfi1_enable_rsm_rule(dd, RSM_INS_VNIC, &rrd);
}
void hfi1_deinit_vnic_rsm(struct hfi1_devdata *dd)
{
clear_rsm_rule(dd, RSM_INS_VNIC);
}
/* Disable RSM if used only by vnic */
if (dd->vnic.rmt_start == 0)
clear_rcvctrl(dd, RCV_CTRL_RCV_RSM_ENABLE_SMASK);
void hfi1_deinit_aip_rsm(struct hfi1_devdata *dd)
{
/* only actually clear the rule if it's the last user asking to do so */
if (atomic_fetch_add_unless(&dd->ipoib_rsm_usr_num, -1, 0) == 1)
clear_rsm_rule(dd, RSM_INS_AIP);
}
static int init_rxe(struct hfi1_devdata *dd)
@ -14574,8 +14688,8 @@ static int init_rxe(struct hfi1_devdata *dd)
init_qos(dd, rmt);
init_fecn_handling(dd, rmt);
complete_rsm_map_table(dd, rmt);
/* record number of used rsm map entries for vnic */
dd->vnic.rmt_start = rmt->used;
/* record number of used rsm map entries for netdev */
hfi1_netdev_set_free_rmt_idx(dd, rmt->used);
kfree(rmt);
/*
@ -15129,6 +15243,10 @@ int hfi1_init_dd(struct hfi1_devdata *dd)
(dd->revision >> CCE_REVISION_SW_SHIFT)
& CCE_REVISION_SW_MASK);
/* alloc netdev data */
if (hfi1_netdev_alloc(dd))
goto bail_cleanup;
ret = set_up_context_variables(dd);
if (ret)
goto bail_cleanup;
@ -15229,6 +15347,7 @@ bail_clear_intr:
hfi1_comp_vectors_clean_up(dd);
msix_clean_up_interrupts(dd);
bail_cleanup:
hfi1_netdev_free(dd);
hfi1_pcie_ddcleanup(dd);
bail_free:
hfi1_free_devdata(dd);

View File

@ -1,7 +1,7 @@
#ifndef _CHIP_H
#define _CHIP_H
/*
* Copyright(c) 2015 - 2018 Intel Corporation.
* Copyright(c) 2015 - 2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -1447,6 +1447,7 @@ irqreturn_t general_interrupt(int irq, void *data);
irqreturn_t sdma_interrupt(int irq, void *data);
irqreturn_t receive_context_interrupt(int irq, void *data);
irqreturn_t receive_context_thread(int irq, void *data);
irqreturn_t receive_context_interrupt_napi(int irq, void *data);
int set_intr_bits(struct hfi1_devdata *dd, u16 first, u16 last, bool set);
void init_qsfp_int(struct hfi1_devdata *dd);
@ -1455,6 +1456,8 @@ void remap_intr(struct hfi1_devdata *dd, int isrc, int msix_intr);
void remap_sdma_interrupts(struct hfi1_devdata *dd, int engine, int msix_intr);
void reset_interrupts(struct hfi1_devdata *dd);
u8 hfi1_get_qp_map(struct hfi1_devdata *dd, u8 idx);
void hfi1_init_aip_rsm(struct hfi1_devdata *dd);
void hfi1_deinit_aip_rsm(struct hfi1_devdata *dd);
/*
* Interrupt source table.

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015 - 2018 Intel Corporation.
* Copyright(c) 2015 - 2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -72,13 +72,6 @@
* compilation unit
*/
/*
* If a packet's QP[23:16] bits match this value, then it is
* a PSM packet and the hardware will expect a KDETH header
* following the BTH.
*/
#define DEFAULT_KDETH_QP 0x80
/* driver/hw feature set bitmask */
#define HFI1_CAP_USER_SHIFT 24
#define HFI1_CAP_MASK ((1UL << HFI1_CAP_USER_SHIFT) - 1)
@ -149,7 +142,8 @@
HFI1_CAP_NO_INTEGRITY | \
HFI1_CAP_PKEY_CHECK | \
HFI1_CAP_TID_RDMA | \
HFI1_CAP_OPFN) << \
HFI1_CAP_OPFN | \
HFI1_CAP_AIP) << \
HFI1_CAP_USER_SHIFT)
/*
* Set of capabilities that need to be enabled for kernel context in
@ -166,6 +160,7 @@
HFI1_CAP_PKEY_CHECK | \
HFI1_CAP_MULTI_PKT_EGR | \
HFI1_CAP_EXTENDED_PSN | \
HFI1_CAP_AIP | \
((HFI1_CAP_HDRSUPP | \
HFI1_CAP_MULTI_PKT_EGR | \
HFI1_CAP_STATIC_RATE_CTRL | \

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015-2018 Intel Corporation.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -54,6 +54,7 @@
#include <linux/module.h>
#include <linux/prefetch.h>
#include <rdma/ib_verbs.h>
#include <linux/etherdevice.h>
#include "hfi.h"
#include "trace.h"
@ -63,6 +64,9 @@
#include "vnic.h"
#include "fault.h"
#include "ipoib.h"
#include "netdev.h"
#undef pr_fmt
#define pr_fmt(fmt) DRIVER_NAME ": " fmt
@ -748,6 +752,39 @@ static noinline int skip_rcv_packet(struct hfi1_packet *packet, int thread)
return ret;
}
static void process_rcv_packet_napi(struct hfi1_packet *packet)
{
packet->etype = rhf_rcv_type(packet->rhf);
/* total length */
packet->tlen = rhf_pkt_len(packet->rhf); /* in bytes */
/* retrieve eager buffer details */
packet->etail = rhf_egr_index(packet->rhf);
packet->ebuf = get_egrbuf(packet->rcd, packet->rhf,
&packet->updegr);
/*
* Prefetch the contents of the eager buffer. It is
* OK to send a negative length to prefetch_range().
* The +2 is the size of the RHF.
*/
prefetch_range(packet->ebuf,
packet->tlen - ((packet->rcd->rcvhdrqentsize -
(rhf_hdrq_offset(packet->rhf)
+ 2)) * 4));
packet->rcd->rhf_rcv_function_map[packet->etype](packet);
packet->numpkt++;
/* Set up for the next packet */
packet->rhqoff += packet->rsize;
if (packet->rhqoff >= packet->maxcnt)
packet->rhqoff = 0;
packet->rhf_addr = (__le32 *)packet->rcd->rcvhdrq + packet->rhqoff +
packet->rcd->rhf_offset;
packet->rhf = rhf_to_cpu(packet->rhf_addr);
}
static inline int process_rcv_packet(struct hfi1_packet *packet, int thread)
{
int ret;
@ -826,6 +863,36 @@ static inline void finish_packet(struct hfi1_packet *packet)
packet->etail, rcv_intr_dynamic, packet->numpkt);
}
/*
* handle_receive_interrupt_napi_fp - receive a packet
* @rcd: the context
* @budget: polling budget
*
* Called from interrupt handler for receive interrupt.
* This is the fast path interrupt handler
* when executing napi soft irq environment.
*/
int handle_receive_interrupt_napi_fp(struct hfi1_ctxtdata *rcd, int budget)
{
struct hfi1_packet packet;
init_packet(rcd, &packet);
if (last_rcv_seq(rcd, rhf_rcv_seq(packet.rhf)))
goto bail;
while (packet.numpkt < budget) {
process_rcv_packet_napi(&packet);
if (hfi1_seq_incr(rcd, rhf_rcv_seq(packet.rhf)))
break;
process_rcv_update(0, &packet);
}
hfi1_set_rcd_head(rcd, packet.rhqoff);
bail:
finish_packet(&packet);
return packet.numpkt;
}
/*
* Handle receive interrupts when using the no dma rtail option.
*/
@ -1073,6 +1140,63 @@ bail:
return last;
}
/*
* handle_receive_interrupt_napi_sp - receive a packet
* @rcd: the context
* @budget: polling budget
*
* Called from interrupt handler for errors or receive interrupt.
* This is the slow path interrupt handler
* when executing napi soft irq environment.
*/
int handle_receive_interrupt_napi_sp(struct hfi1_ctxtdata *rcd, int budget)
{
struct hfi1_devdata *dd = rcd->dd;
int last = RCV_PKT_OK;
bool needset = true;
struct hfi1_packet packet;
init_packet(rcd, &packet);
if (last_rcv_seq(rcd, rhf_rcv_seq(packet.rhf)))
goto bail;
while (last != RCV_PKT_DONE && packet.numpkt < budget) {
if (hfi1_need_drop(dd)) {
/* On to the next packet */
packet.rhqoff += packet.rsize;
packet.rhf_addr = (__le32 *)rcd->rcvhdrq +
packet.rhqoff +
rcd->rhf_offset;
packet.rhf = rhf_to_cpu(packet.rhf_addr);
} else {
if (set_armed_to_active(&packet))
goto bail;
process_rcv_packet_napi(&packet);
}
if (hfi1_seq_incr(rcd, rhf_rcv_seq(packet.rhf)))
last = RCV_PKT_DONE;
if (needset) {
needset = false;
set_all_fastpath(dd, rcd);
}
process_rcv_update(last, &packet);
}
hfi1_set_rcd_head(rcd, packet.rhqoff);
bail:
/*
* Always write head at end, and setup rcv interrupt, even
* if no packets were processed.
*/
finish_packet(&packet);
return packet.numpkt;
}
/*
* We may discover in the interrupt that the hardware link state has
* changed from ARMED to ACTIVE (due to the arrival of a non-SC15 packet),
@ -1550,6 +1674,82 @@ void handle_eflags(struct hfi1_packet *packet)
show_eflags_errs(packet);
}
static void hfi1_ipoib_ib_rcv(struct hfi1_packet *packet)
{
struct hfi1_ibport *ibp;
struct net_device *netdev;
struct hfi1_ctxtdata *rcd = packet->rcd;
struct napi_struct *napi = rcd->napi;
struct sk_buff *skb;
struct hfi1_netdev_rxq *rxq = container_of(napi,
struct hfi1_netdev_rxq, napi);
u32 extra_bytes;
u32 tlen, qpnum;
bool do_work, do_cnp;
struct hfi1_ipoib_dev_priv *priv;
trace_hfi1_rcvhdr(packet);
hfi1_setup_ib_header(packet);
packet->ohdr = &((struct ib_header *)packet->hdr)->u.oth;
packet->grh = NULL;
if (unlikely(rhf_err_flags(packet->rhf))) {
handle_eflags(packet);
return;
}
qpnum = ib_bth_get_qpn(packet->ohdr);
netdev = hfi1_netdev_get_data(rcd->dd, qpnum);
if (!netdev)
goto drop_no_nd;
trace_input_ibhdr(rcd->dd, packet, !!(rhf_dc_info(packet->rhf)));
trace_ctxt_rsm_hist(rcd->ctxt);
/* handle congestion notifications */
do_work = hfi1_may_ecn(packet);
if (unlikely(do_work)) {
do_cnp = (packet->opcode != IB_OPCODE_CNP);
(void)hfi1_process_ecn_slowpath(hfi1_ipoib_priv(netdev)->qp,
packet, do_cnp);
}
/*
* We have split point after last byte of DETH
* lets strip padding and CRC and ICRC.
* tlen is whole packet len so we need to
* subtract header size as well.
*/
tlen = packet->tlen;
extra_bytes = ib_bth_get_pad(packet->ohdr) + (SIZE_OF_CRC << 2) +
packet->hlen;
if (unlikely(tlen < extra_bytes))
goto drop;
tlen -= extra_bytes;
skb = hfi1_ipoib_prepare_skb(rxq, tlen, packet->ebuf);
if (unlikely(!skb))
goto drop;
priv = hfi1_ipoib_priv(netdev);
hfi1_ipoib_update_rx_netstats(priv, 1, skb->len);
skb->dev = netdev;
skb->pkt_type = PACKET_HOST;
netif_receive_skb(skb);
return;
drop:
++netdev->stats.rx_dropped;
drop_no_nd:
ibp = rcd_to_iport(packet->rcd);
++ibp->rvp.n_pkt_drops;
}
/*
* The following functions are called by the interrupt handler. They are type
* specific handlers for each packet type.
@ -1572,28 +1772,10 @@ static void process_receive_ib(struct hfi1_packet *packet)
hfi1_ib_rcv(packet);
}
static inline bool hfi1_is_vnic_packet(struct hfi1_packet *packet)
{
/* Packet received in VNIC context via RSM */
if (packet->rcd->is_vnic)
return true;
if ((hfi1_16B_get_l2(packet->ebuf) == OPA_16B_L2_TYPE) &&
(hfi1_16B_get_l4(packet->ebuf) == OPA_16B_L4_ETHR))
return true;
return false;
}
static void process_receive_bypass(struct hfi1_packet *packet)
{
struct hfi1_devdata *dd = packet->rcd->dd;
if (hfi1_is_vnic_packet(packet)) {
hfi1_vnic_bypass_rcv(packet);
return;
}
if (hfi1_setup_bypass_packet(packet))
return;
@ -1757,3 +1939,14 @@ const rhf_rcv_function_ptr normal_rhf_rcv_functions[] = {
[RHF_RCV_TYPE_INVALID6] = process_receive_invalid,
[RHF_RCV_TYPE_INVALID7] = process_receive_invalid,
};
const rhf_rcv_function_ptr netdev_rhf_rcv_functions[] = {
[RHF_RCV_TYPE_EXPECTED] = process_receive_invalid,
[RHF_RCV_TYPE_EAGER] = process_receive_invalid,
[RHF_RCV_TYPE_IB] = hfi1_ipoib_ib_rcv,
[RHF_RCV_TYPE_ERROR] = process_receive_error,
[RHF_RCV_TYPE_BYPASS] = hfi1_vnic_bypass_rcv,
[RHF_RCV_TYPE_INVALID5] = process_receive_invalid,
[RHF_RCV_TYPE_INVALID6] = process_receive_invalid,
[RHF_RCV_TYPE_INVALID7] = process_receive_invalid,
};

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015-2017 Intel Corporation.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -1264,7 +1264,7 @@ static int get_base_info(struct hfi1_filedata *fd, unsigned long arg, u32 len)
memset(&binfo, 0, sizeof(binfo));
binfo.hw_version = dd->revision;
binfo.sw_version = HFI1_KERN_SWVERSION;
binfo.bthqp = kdeth_qp;
binfo.bthqp = RVT_KDETH_QP_PREFIX;
binfo.jkey = uctxt->jkey;
/*
* If more than 64 contexts are enabled the allocated credit

View File

@ -1,7 +1,7 @@
#ifndef _HFI1_KERNEL_H
#define _HFI1_KERNEL_H
/*
* Copyright(c) 2015-2018 Intel Corporation.
* Copyright(c) 2015-2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -233,6 +233,8 @@ struct hfi1_ctxtdata {
intr_handler fast_handler;
/** slow handler */
intr_handler slow_handler;
/* napi pointer assiociated with netdev */
struct napi_struct *napi;
/* verbs rx_stats per rcd */
struct hfi1_opcode_stats_perctx *opstats;
/* clear interrupt mask */
@ -383,11 +385,11 @@ struct hfi1_packet {
u32 rhqoff;
u32 dlid;
u32 slid;
int numpkt;
u16 tlen;
s16 etail;
u16 pkey;
u8 hlen;
u8 numpkt;
u8 rsize;
u8 updegr;
u8 etype;
@ -985,7 +987,7 @@ typedef void (*hfi1_make_req)(struct rvt_qp *qp,
struct hfi1_pkt_state *ps,
struct rvt_swqe *wqe);
extern const rhf_rcv_function_ptr normal_rhf_rcv_functions[];
extern const rhf_rcv_function_ptr netdev_rhf_rcv_functions[];
/* return values for the RHF receive functions */
#define RHF_RCV_CONTINUE 0 /* keep going */
@ -1045,23 +1047,10 @@ struct hfi1_asic_data {
#define NUM_MAP_ENTRIES 256
#define NUM_MAP_REGS 32
/*
* Number of VNIC contexts used. Ensure it is less than or equal to
* max queues supported by VNIC (HFI1_VNIC_MAX_QUEUE).
*/
#define HFI1_NUM_VNIC_CTXT 8
/* Number of VNIC RSM entries */
#define NUM_VNIC_MAP_ENTRIES 8
/* Virtual NIC information */
struct hfi1_vnic_data {
struct hfi1_ctxtdata *ctxt[HFI1_NUM_VNIC_CTXT];
struct kmem_cache *txreq_cache;
struct xarray vesws;
u8 num_vports;
u8 rmt_start;
u8 num_ctxt;
};
struct hfi1_vnic_vport_info;
@ -1167,8 +1156,8 @@ struct hfi1_devdata {
u64 z_send_schedule;
u64 __percpu *send_schedule;
/* number of reserved contexts for VNIC usage */
u16 num_vnic_contexts;
/* number of reserved contexts for netdev usage */
u16 num_netdev_contexts;
/* number of receive contexts in use by the driver */
u32 num_rcv_contexts;
/* number of pio send contexts in use by the driver */
@ -1417,12 +1406,12 @@ struct hfi1_devdata {
struct hfi1_vnic_data vnic;
/* Lock to protect IRQ SRC register access */
spinlock_t irq_src_lock;
};
int vnic_num_vports;
struct net_device *dummy_netdev;
static inline bool hfi1_vnic_is_rsm_full(struct hfi1_devdata *dd, int spare)
{
return (dd->vnic.rmt_start + spare) > NUM_MAP_ENTRIES;
}
/* Keeps track of IPoIB RSM rule users */
atomic_t ipoib_rsm_usr_num;
};
/* 8051 firmware version helper */
#define dc8051_ver(a, b, c) ((a) << 16 | (b) << 8 | (c))
@ -1500,6 +1489,8 @@ struct hfi1_ctxtdata *hfi1_rcd_get_by_index(struct hfi1_devdata *dd, u16 ctxt);
int handle_receive_interrupt(struct hfi1_ctxtdata *rcd, int thread);
int handle_receive_interrupt_nodma_rtail(struct hfi1_ctxtdata *rcd, int thread);
int handle_receive_interrupt_dma_rtail(struct hfi1_ctxtdata *rcd, int thread);
int handle_receive_interrupt_napi_fp(struct hfi1_ctxtdata *rcd, int budget);
int handle_receive_interrupt_napi_sp(struct hfi1_ctxtdata *rcd, int budget);
void set_all_slowpath(struct hfi1_devdata *dd);
extern const struct pci_device_id hfi1_pci_tbl[];
@ -2250,7 +2241,6 @@ extern int num_user_contexts;
extern unsigned long n_krcvqs;
extern uint krcvqs[];
extern int krcvqsset;
extern uint kdeth_qp;
extern uint loopback;
extern uint quick_linkup;
extern uint rcv_intr_timeout;

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015 - 2018 Intel Corporation.
* Copyright(c) 2015 - 2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -69,6 +69,7 @@
#include "affinity.h"
#include "vnic.h"
#include "exp_rcv.h"
#include "netdev.h"
#undef pr_fmt
#define pr_fmt(fmt) DRIVER_NAME ": " fmt
@ -374,6 +375,7 @@ int hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, int numa,
rcd->numa_id = numa;
rcd->rcv_array_groups = dd->rcv_entries.ngroups;
rcd->rhf_rcv_function_map = normal_rhf_rcv_functions;
rcd->msix_intr = CCE_NUM_MSIX_VECTORS;
mutex_init(&rcd->exp_mutex);
spin_lock_init(&rcd->exp_lock);
@ -1316,6 +1318,7 @@ static struct hfi1_devdata *hfi1_alloc_devdata(struct pci_dev *pdev,
goto bail;
}
atomic_set(&dd->ipoib_rsm_usr_num, 0);
return dd;
bail:
@ -1663,9 +1666,6 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
/* do the generic initialization */
initfail = hfi1_init(dd, 0);
/* setup vnic */
hfi1_vnic_setup(dd);
ret = hfi1_register_ib_device(dd);
/*
@ -1704,7 +1704,6 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
hfi1_device_remove(dd);
if (!ret)
hfi1_unregister_ib_device(dd);
hfi1_vnic_cleanup(dd);
postinit_cleanup(dd);
if (initfail)
ret = initfail;
@ -1749,8 +1748,8 @@ static void remove_one(struct pci_dev *pdev)
/* unregister from IB core */
hfi1_unregister_ib_device(dd);
/* cleanup vnic */
hfi1_vnic_cleanup(dd);
/* free netdev data */
hfi1_netdev_free(dd);
/*
* Disable the IB link, disable interrupts on the device,

View File

@ -0,0 +1,171 @@
/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
/*
* Copyright(c) 2020 Intel Corporation.
*
*/
/*
* This file contains HFI1 support for IPOIB functionality
*/
#ifndef HFI1_IPOIB_H
#define HFI1_IPOIB_H
#include <linux/types.h>
#include <linux/stddef.h>
#include <linux/atomic.h>
#include <linux/netdevice.h>
#include <linux/slab.h>
#include <linux/skbuff.h>
#include <linux/list.h>
#include <linux/if_infiniband.h>
#include "hfi.h"
#include "iowait.h"
#include "netdev.h"
#include <rdma/ib_verbs.h>
#define HFI1_IPOIB_ENTROPY_SHIFT 24
#define HFI1_IPOIB_TXREQ_NAME_LEN 32
#define HFI1_IPOIB_PSEUDO_LEN 20
#define HFI1_IPOIB_ENCAP_LEN 4
struct hfi1_ipoib_dev_priv;
union hfi1_ipoib_flow {
u16 as_int;
struct {
u8 tx_queue;
u8 sc5;
} __attribute__((__packed__));
};
/**
* struct hfi1_ipoib_circ_buf - List of items to be processed
* @items: ring of items
* @head: ring head
* @tail: ring tail
* @max_items: max items + 1 that the ring can contain
* @producer_lock: producer sync lock
* @consumer_lock: consumer sync lock
*/
struct hfi1_ipoib_circ_buf {
void **items;
unsigned long head;
unsigned long tail;
unsigned long max_items;
spinlock_t producer_lock; /* head sync lock */
spinlock_t consumer_lock; /* tail sync lock */
};
/**
* struct hfi1_ipoib_txq - IPOIB per Tx queue information
* @priv: private pointer
* @sde: sdma engine
* @tx_list: tx request list
* @sent_txreqs: count of txreqs posted to sdma
* @flow: tracks when list needs to be flushed for a flow change
* @q_idx: ipoib Tx queue index
* @pkts_sent: indicator packets have been sent from this queue
* @wait: iowait structure
* @complete_txreqs: count of txreqs completed by sdma
* @napi: pointer to tx napi interface
* @tx_ring: ring of ipoib txreqs to be reaped by napi callback
*/
struct hfi1_ipoib_txq {
struct hfi1_ipoib_dev_priv *priv;
struct sdma_engine *sde;
struct list_head tx_list;
u64 sent_txreqs;
union hfi1_ipoib_flow flow;
u8 q_idx;
bool pkts_sent;
struct iowait wait;
atomic64_t ____cacheline_aligned_in_smp complete_txreqs;
struct napi_struct *napi;
struct hfi1_ipoib_circ_buf tx_ring;
};
struct hfi1_ipoib_dev_priv {
struct hfi1_devdata *dd;
struct net_device *netdev;
struct ib_device *device;
struct hfi1_ipoib_txq *txqs;
struct kmem_cache *txreq_cache;
struct napi_struct *tx_napis;
u16 pkey;
u16 pkey_index;
u32 qkey;
u8 port_num;
const struct net_device_ops *netdev_ops;
struct rvt_qp *qp;
struct pcpu_sw_netstats __percpu *netstats;
};
/* hfi1 ipoib rdma netdev's private data structure */
struct hfi1_ipoib_rdma_netdev {
struct rdma_netdev rn; /* keep this first */
/* followed by device private data */
struct hfi1_ipoib_dev_priv dev_priv;
};
static inline struct hfi1_ipoib_dev_priv *
hfi1_ipoib_priv(const struct net_device *dev)
{
return &((struct hfi1_ipoib_rdma_netdev *)netdev_priv(dev))->dev_priv;
}
static inline void
hfi1_ipoib_update_rx_netstats(struct hfi1_ipoib_dev_priv *priv,
u64 packets,
u64 bytes)
{
struct pcpu_sw_netstats *netstats = this_cpu_ptr(priv->netstats);
u64_stats_update_begin(&netstats->syncp);
netstats->rx_packets += packets;
netstats->rx_bytes += bytes;
u64_stats_update_end(&netstats->syncp);
}
static inline void
hfi1_ipoib_update_tx_netstats(struct hfi1_ipoib_dev_priv *priv,
u64 packets,
u64 bytes)
{
struct pcpu_sw_netstats *netstats = this_cpu_ptr(priv->netstats);
u64_stats_update_begin(&netstats->syncp);
netstats->tx_packets += packets;
netstats->tx_bytes += bytes;
u64_stats_update_end(&netstats->syncp);
}
int hfi1_ipoib_send_dma(struct net_device *dev,
struct sk_buff *skb,
struct ib_ah *address,
u32 dqpn);
int hfi1_ipoib_txreq_init(struct hfi1_ipoib_dev_priv *priv);
void hfi1_ipoib_txreq_deinit(struct hfi1_ipoib_dev_priv *priv);
int hfi1_ipoib_rxq_init(struct net_device *dev);
void hfi1_ipoib_rxq_deinit(struct net_device *dev);
void hfi1_ipoib_napi_tx_enable(struct net_device *dev);
void hfi1_ipoib_napi_tx_disable(struct net_device *dev);
struct sk_buff *hfi1_ipoib_prepare_skb(struct hfi1_netdev_rxq *rxq,
int size, void *data);
int hfi1_ipoib_rn_get_params(struct ib_device *device,
u8 port_num,
enum rdma_netdev_t type,
struct rdma_netdev_alloc_params *params);
#endif /* _IPOIB_H */

View File

@ -0,0 +1,309 @@
// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
/*
* Copyright(c) 2020 Intel Corporation.
*
*/
/*
* This file contains HFI1 support for ipoib functionality
*/
#include "ipoib.h"
#include "hfi.h"
static u32 qpn_from_mac(u8 *mac_arr)
{
return (u32)mac_arr[1] << 16 | mac_arr[2] << 8 | mac_arr[3];
}
static int hfi1_ipoib_dev_init(struct net_device *dev)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
int ret;
priv->netstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
ret = priv->netdev_ops->ndo_init(dev);
if (ret)
return ret;
ret = hfi1_netdev_add_data(priv->dd,
qpn_from_mac(priv->netdev->dev_addr),
dev);
if (ret < 0) {
priv->netdev_ops->ndo_uninit(dev);
return ret;
}
return 0;
}
static void hfi1_ipoib_dev_uninit(struct net_device *dev)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
hfi1_netdev_remove_data(priv->dd, qpn_from_mac(priv->netdev->dev_addr));
priv->netdev_ops->ndo_uninit(dev);
}
static int hfi1_ipoib_dev_open(struct net_device *dev)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
int ret;
ret = priv->netdev_ops->ndo_open(dev);
if (!ret) {
struct hfi1_ibport *ibp = to_iport(priv->device,
priv->port_num);
struct rvt_qp *qp;
u32 qpn = qpn_from_mac(priv->netdev->dev_addr);
rcu_read_lock();
qp = rvt_lookup_qpn(ib_to_rvt(priv->device), &ibp->rvp, qpn);
if (!qp) {
rcu_read_unlock();
priv->netdev_ops->ndo_stop(dev);
return -EINVAL;
}
rvt_get_qp(qp);
priv->qp = qp;
rcu_read_unlock();
hfi1_netdev_enable_queues(priv->dd);
hfi1_ipoib_napi_tx_enable(dev);
}
return ret;
}
static int hfi1_ipoib_dev_stop(struct net_device *dev)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
if (!priv->qp)
return 0;
hfi1_ipoib_napi_tx_disable(dev);
hfi1_netdev_disable_queues(priv->dd);
rvt_put_qp(priv->qp);
priv->qp = NULL;
return priv->netdev_ops->ndo_stop(dev);
}
static void hfi1_ipoib_dev_get_stats64(struct net_device *dev,
struct rtnl_link_stats64 *storage)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
u64 rx_packets = 0ull;
u64 rx_bytes = 0ull;
u64 tx_packets = 0ull;
u64 tx_bytes = 0ull;
int i;
netdev_stats_to_stats64(storage, &dev->stats);
for_each_possible_cpu(i) {
const struct pcpu_sw_netstats *stats;
unsigned int start;
u64 trx_packets;
u64 trx_bytes;
u64 ttx_packets;
u64 ttx_bytes;
stats = per_cpu_ptr(priv->netstats, i);
do {
start = u64_stats_fetch_begin_irq(&stats->syncp);
trx_packets = stats->rx_packets;
trx_bytes = stats->rx_bytes;
ttx_packets = stats->tx_packets;
ttx_bytes = stats->tx_bytes;
} while (u64_stats_fetch_retry_irq(&stats->syncp, start));
rx_packets += trx_packets;
rx_bytes += trx_bytes;
tx_packets += ttx_packets;
tx_bytes += ttx_bytes;
}
storage->rx_packets += rx_packets;
storage->rx_bytes += rx_bytes;
storage->tx_packets += tx_packets;
storage->tx_bytes += tx_bytes;
}
static const struct net_device_ops hfi1_ipoib_netdev_ops = {
.ndo_init = hfi1_ipoib_dev_init,
.ndo_uninit = hfi1_ipoib_dev_uninit,
.ndo_open = hfi1_ipoib_dev_open,
.ndo_stop = hfi1_ipoib_dev_stop,
.ndo_get_stats64 = hfi1_ipoib_dev_get_stats64,
};
static int hfi1_ipoib_send(struct net_device *dev,
struct sk_buff *skb,
struct ib_ah *address,
u32 dqpn)
{
return hfi1_ipoib_send_dma(dev, skb, address, dqpn);
}
static int hfi1_ipoib_mcast_attach(struct net_device *dev,
struct ib_device *device,
union ib_gid *mgid,
u16 mlid,
int set_qkey,
u32 qkey)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
u32 qpn = (u32)qpn_from_mac(priv->netdev->dev_addr);
struct hfi1_ibport *ibp = to_iport(priv->device, priv->port_num);
struct rvt_qp *qp;
int ret = -EINVAL;
rcu_read_lock();
qp = rvt_lookup_qpn(ib_to_rvt(priv->device), &ibp->rvp, qpn);
if (qp) {
rvt_get_qp(qp);
rcu_read_unlock();
if (set_qkey)
priv->qkey = qkey;
/* attach QP to multicast group */
ret = ib_attach_mcast(&qp->ibqp, mgid, mlid);
rvt_put_qp(qp);
} else {
rcu_read_unlock();
}
return ret;
}
static int hfi1_ipoib_mcast_detach(struct net_device *dev,
struct ib_device *device,
union ib_gid *mgid,
u16 mlid)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
u32 qpn = (u32)qpn_from_mac(priv->netdev->dev_addr);
struct hfi1_ibport *ibp = to_iport(priv->device, priv->port_num);
struct rvt_qp *qp;
int ret = -EINVAL;
rcu_read_lock();
qp = rvt_lookup_qpn(ib_to_rvt(priv->device), &ibp->rvp, qpn);
if (qp) {
rvt_get_qp(qp);
rcu_read_unlock();
ret = ib_detach_mcast(&qp->ibqp, mgid, mlid);
rvt_put_qp(qp);
} else {
rcu_read_unlock();
}
return ret;
}
static void hfi1_ipoib_netdev_dtor(struct net_device *dev)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
hfi1_ipoib_txreq_deinit(priv);
hfi1_ipoib_rxq_deinit(priv->netdev);
free_percpu(priv->netstats);
}
static void hfi1_ipoib_free_rdma_netdev(struct net_device *dev)
{
hfi1_ipoib_netdev_dtor(dev);
free_netdev(dev);
}
static void hfi1_ipoib_set_id(struct net_device *dev, int id)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
priv->pkey_index = (u16)id;
ib_query_pkey(priv->device,
priv->port_num,
priv->pkey_index,
&priv->pkey);
}
static int hfi1_ipoib_setup_rn(struct ib_device *device,
u8 port_num,
struct net_device *netdev,
void *param)
{
struct hfi1_devdata *dd = dd_from_ibdev(device);
struct rdma_netdev *rn = netdev_priv(netdev);
struct hfi1_ipoib_dev_priv *priv;
int rc;
rn->send = hfi1_ipoib_send;
rn->attach_mcast = hfi1_ipoib_mcast_attach;
rn->detach_mcast = hfi1_ipoib_mcast_detach;
rn->set_id = hfi1_ipoib_set_id;
rn->hca = device;
rn->port_num = port_num;
rn->mtu = netdev->mtu;
priv = hfi1_ipoib_priv(netdev);
priv->dd = dd;
priv->netdev = netdev;
priv->device = device;
priv->port_num = port_num;
priv->netdev_ops = netdev->netdev_ops;
netdev->netdev_ops = &hfi1_ipoib_netdev_ops;
ib_query_pkey(device, port_num, priv->pkey_index, &priv->pkey);
rc = hfi1_ipoib_txreq_init(priv);
if (rc) {
dd_dev_err(dd, "IPoIB netdev TX init - failed(%d)\n", rc);
hfi1_ipoib_free_rdma_netdev(netdev);
return rc;
}
rc = hfi1_ipoib_rxq_init(netdev);
if (rc) {
dd_dev_err(dd, "IPoIB netdev RX init - failed(%d)\n", rc);
hfi1_ipoib_free_rdma_netdev(netdev);
return rc;
}
netdev->priv_destructor = hfi1_ipoib_netdev_dtor;
netdev->needs_free_netdev = true;
return 0;
}
int hfi1_ipoib_rn_get_params(struct ib_device *device,
u8 port_num,
enum rdma_netdev_t type,
struct rdma_netdev_alloc_params *params)
{
struct hfi1_devdata *dd = dd_from_ibdev(device);
if (type != RDMA_NETDEV_IPOIB)
return -EOPNOTSUPP;
if (!HFI1_CAP_IS_KSET(AIP) || !dd->num_netdev_contexts)
return -EOPNOTSUPP;
if (!port_num || port_num > dd->num_pports)
return -EINVAL;
params->sizeof_priv = sizeof(struct hfi1_ipoib_rdma_netdev);
params->txqs = dd->num_sdma;
params->rxqs = dd->num_netdev_contexts;
params->param = NULL;
params->initialize_rdma_netdev = hfi1_ipoib_setup_rn;
return 0;
}

View File

@ -0,0 +1,95 @@
// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
/*
* Copyright(c) 2020 Intel Corporation.
*
*/
#include "netdev.h"
#include "ipoib.h"
#define HFI1_IPOIB_SKB_PAD ((NET_SKB_PAD) + (NET_IP_ALIGN))
static void copy_ipoib_buf(struct sk_buff *skb, void *data, int size)
{
void *dst_data;
skb_checksum_none_assert(skb);
skb->protocol = *((__be16 *)data);
dst_data = skb_put(skb, size);
memcpy(dst_data, data, size);
skb->mac_header = HFI1_IPOIB_PSEUDO_LEN;
skb_pull(skb, HFI1_IPOIB_ENCAP_LEN);
}
static struct sk_buff *prepare_frag_skb(struct napi_struct *napi, int size)
{
struct sk_buff *skb;
int skb_size = SKB_DATA_ALIGN(size + HFI1_IPOIB_SKB_PAD);
void *frag;
skb_size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
skb_size = SKB_DATA_ALIGN(skb_size);
frag = napi_alloc_frag(skb_size);
if (unlikely(!frag))
return napi_alloc_skb(napi, size);
skb = build_skb(frag, skb_size);
if (unlikely(!skb)) {
skb_free_frag(frag);
return NULL;
}
skb_reserve(skb, HFI1_IPOIB_SKB_PAD);
return skb;
}
struct sk_buff *hfi1_ipoib_prepare_skb(struct hfi1_netdev_rxq *rxq,
int size, void *data)
{
struct napi_struct *napi = &rxq->napi;
int skb_size = size + HFI1_IPOIB_ENCAP_LEN;
struct sk_buff *skb;
/*
* For smaller(4k + skb overhead) allocations we will go using
* napi cache. Otherwise we will try to use napi frag cache.
*/
if (size <= SKB_WITH_OVERHEAD(PAGE_SIZE))
skb = napi_alloc_skb(napi, skb_size);
else
skb = prepare_frag_skb(napi, skb_size);
if (unlikely(!skb))
return NULL;
copy_ipoib_buf(skb, data, size);
return skb;
}
int hfi1_ipoib_rxq_init(struct net_device *netdev)
{
struct hfi1_ipoib_dev_priv *ipoib_priv = hfi1_ipoib_priv(netdev);
struct hfi1_devdata *dd = ipoib_priv->dd;
int ret;
ret = hfi1_netdev_rx_init(dd);
if (ret)
return ret;
hfi1_init_aip_rsm(dd);
return ret;
}
void hfi1_ipoib_rxq_deinit(struct net_device *netdev)
{
struct hfi1_ipoib_dev_priv *ipoib_priv = hfi1_ipoib_priv(netdev);
struct hfi1_devdata *dd = ipoib_priv->dd;
hfi1_deinit_aip_rsm(dd);
hfi1_netdev_rx_destroy(dd);
}

View File

@ -0,0 +1,828 @@
// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
/*
* Copyright(c) 2020 Intel Corporation.
*
*/
/*
* This file contains HFI1 support for IPOIB SDMA functionality
*/
#include <linux/log2.h>
#include <linux/circ_buf.h>
#include "sdma.h"
#include "verbs.h"
#include "trace_ibhdrs.h"
#include "ipoib.h"
/* Add a convenience helper */
#define CIRC_ADD(val, add, size) (((val) + (add)) & ((size) - 1))
#define CIRC_NEXT(val, size) CIRC_ADD(val, 1, size)
#define CIRC_PREV(val, size) CIRC_ADD(val, -1, size)
/**
* struct ipoib_txreq - IPOIB transmit descriptor
* @txreq: sdma transmit request
* @sdma_hdr: 9b ib headers
* @sdma_status: status returned by sdma engine
* @priv: ipoib netdev private data
* @txq: txq on which skb was output
* @skb: skb to send
*/
struct ipoib_txreq {
struct sdma_txreq txreq;
struct hfi1_sdma_header sdma_hdr;
int sdma_status;
struct hfi1_ipoib_dev_priv *priv;
struct hfi1_ipoib_txq *txq;
struct sk_buff *skb;
};
struct ipoib_txparms {
struct hfi1_devdata *dd;
struct rdma_ah_attr *ah_attr;
struct hfi1_ibport *ibp;
struct hfi1_ipoib_txq *txq;
union hfi1_ipoib_flow flow;
u32 dqpn;
u8 hdr_dwords;
u8 entropy;
};
static u64 hfi1_ipoib_txreqs(const u64 sent, const u64 completed)
{
return sent - completed;
}
static void hfi1_ipoib_check_queue_depth(struct hfi1_ipoib_txq *txq)
{
if (unlikely(hfi1_ipoib_txreqs(++txq->sent_txreqs,
atomic64_read(&txq->complete_txreqs)) >=
min_t(unsigned int, txq->priv->netdev->tx_queue_len,
txq->tx_ring.max_items - 1)))
netif_stop_subqueue(txq->priv->netdev, txq->q_idx);
}
static void hfi1_ipoib_check_queue_stopped(struct hfi1_ipoib_txq *txq)
{
struct net_device *dev = txq->priv->netdev;
/* If the queue is already running just return */
if (likely(!__netif_subqueue_stopped(dev, txq->q_idx)))
return;
/* If shutting down just return as queue state is irrelevant */
if (unlikely(dev->reg_state != NETREG_REGISTERED))
return;
/*
* When the queue has been drained to less than half full it will be
* restarted.
* The size of the txreq ring is fixed at initialization.
* The tx queue len can be adjusted upward while the interface is
* running.
* The tx queue len can be large enough to overflow the txreq_ring.
* Use the minimum of the current tx_queue_len or the rings max txreqs
* to protect against ring overflow.
*/
if (hfi1_ipoib_txreqs(txq->sent_txreqs,
atomic64_read(&txq->complete_txreqs))
< min_t(unsigned int, dev->tx_queue_len,
txq->tx_ring.max_items) >> 1)
netif_wake_subqueue(dev, txq->q_idx);
}
static void hfi1_ipoib_free_tx(struct ipoib_txreq *tx, int budget)
{
struct hfi1_ipoib_dev_priv *priv = tx->priv;
if (likely(!tx->sdma_status)) {
hfi1_ipoib_update_tx_netstats(priv, 1, tx->skb->len);
} else {
++priv->netdev->stats.tx_errors;
dd_dev_warn(priv->dd,
"%s: Status = 0x%x pbc 0x%llx txq = %d sde = %d\n",
__func__, tx->sdma_status,
le64_to_cpu(tx->sdma_hdr.pbc), tx->txq->q_idx,
tx->txq->sde->this_idx);
}
napi_consume_skb(tx->skb, budget);
sdma_txclean(priv->dd, &tx->txreq);
kmem_cache_free(priv->txreq_cache, tx);
}
static int hfi1_ipoib_drain_tx_ring(struct hfi1_ipoib_txq *txq, int budget)
{
struct hfi1_ipoib_circ_buf *tx_ring = &txq->tx_ring;
unsigned long head;
unsigned long tail;
unsigned int max_tx;
int work_done;
int tx_count;
spin_lock_bh(&tx_ring->consumer_lock);
/* Read index before reading contents at that index. */
head = smp_load_acquire(&tx_ring->head);
tail = tx_ring->tail;
max_tx = tx_ring->max_items;
work_done = min_t(int, CIRC_CNT(head, tail, max_tx), budget);
for (tx_count = work_done; tx_count; tx_count--) {
hfi1_ipoib_free_tx(tx_ring->items[tail], budget);
tail = CIRC_NEXT(tail, max_tx);
}
atomic64_add(work_done, &txq->complete_txreqs);
/* Finished freeing tx items so store the tail value. */
smp_store_release(&tx_ring->tail, tail);
spin_unlock_bh(&tx_ring->consumer_lock);
hfi1_ipoib_check_queue_stopped(txq);
return work_done;
}
static int hfi1_ipoib_process_tx_ring(struct napi_struct *napi, int budget)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(napi->dev);
struct hfi1_ipoib_txq *txq = &priv->txqs[napi - priv->tx_napis];
int work_done = hfi1_ipoib_drain_tx_ring(txq, budget);
if (work_done < budget)
napi_complete_done(napi, work_done);
return work_done;
}
static void hfi1_ipoib_add_tx(struct ipoib_txreq *tx)
{
struct hfi1_ipoib_circ_buf *tx_ring = &tx->txq->tx_ring;
unsigned long head;
unsigned long tail;
size_t max_tx;
spin_lock(&tx_ring->producer_lock);
head = tx_ring->head;
tail = READ_ONCE(tx_ring->tail);
max_tx = tx_ring->max_items;
if (likely(CIRC_SPACE(head, tail, max_tx))) {
tx_ring->items[head] = tx;
/* Finish storing txreq before incrementing head. */
smp_store_release(&tx_ring->head, CIRC_ADD(head, 1, max_tx));
napi_schedule(tx->txq->napi);
} else {
struct hfi1_ipoib_txq *txq = tx->txq;
struct hfi1_ipoib_dev_priv *priv = tx->priv;
/* Ring was full */
hfi1_ipoib_free_tx(tx, 0);
atomic64_inc(&txq->complete_txreqs);
dd_dev_dbg(priv->dd, "txq %d full.\n", txq->q_idx);
}
spin_unlock(&tx_ring->producer_lock);
}
static void hfi1_ipoib_sdma_complete(struct sdma_txreq *txreq, int status)
{
struct ipoib_txreq *tx = container_of(txreq, struct ipoib_txreq, txreq);
tx->sdma_status = status;
hfi1_ipoib_add_tx(tx);
}
static int hfi1_ipoib_build_ulp_payload(struct ipoib_txreq *tx,
struct ipoib_txparms *txp)
{
struct hfi1_devdata *dd = txp->dd;
struct sdma_txreq *txreq = &tx->txreq;
struct sk_buff *skb = tx->skb;
int ret = 0;
int i;
if (skb_headlen(skb)) {
ret = sdma_txadd_kvaddr(dd, txreq, skb->data, skb_headlen(skb));
if (unlikely(ret))
return ret;
}
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
ret = sdma_txadd_page(dd,
txreq,
skb_frag_page(frag),
frag->bv_offset,
skb_frag_size(frag));
if (unlikely(ret))
break;
}
return ret;
}
static int hfi1_ipoib_build_tx_desc(struct ipoib_txreq *tx,
struct ipoib_txparms *txp)
{
struct hfi1_devdata *dd = txp->dd;
struct sdma_txreq *txreq = &tx->txreq;
struct hfi1_sdma_header *sdma_hdr = &tx->sdma_hdr;
u16 pkt_bytes =
sizeof(sdma_hdr->pbc) + (txp->hdr_dwords << 2) + tx->skb->len;
int ret;
ret = sdma_txinit(txreq, 0, pkt_bytes, hfi1_ipoib_sdma_complete);
if (unlikely(ret))
return ret;
/* add pbc + headers */
ret = sdma_txadd_kvaddr(dd,
txreq,
sdma_hdr,
sizeof(sdma_hdr->pbc) + (txp->hdr_dwords << 2));
if (unlikely(ret))
return ret;
/* add the ulp payload */
return hfi1_ipoib_build_ulp_payload(tx, txp);
}
static void hfi1_ipoib_build_ib_tx_headers(struct ipoib_txreq *tx,
struct ipoib_txparms *txp)
{
struct hfi1_ipoib_dev_priv *priv = tx->priv;
struct hfi1_sdma_header *sdma_hdr = &tx->sdma_hdr;
struct sk_buff *skb = tx->skb;
struct hfi1_pportdata *ppd = ppd_from_ibp(txp->ibp);
struct rdma_ah_attr *ah_attr = txp->ah_attr;
struct ib_other_headers *ohdr;
struct ib_grh *grh;
u16 dwords;
u16 slid;
u16 dlid;
u16 lrh0;
u32 bth0;
u32 sqpn = (u32)(priv->netdev->dev_addr[1] << 16 |
priv->netdev->dev_addr[2] << 8 |
priv->netdev->dev_addr[3]);
u16 payload_dwords;
u8 pad_cnt;
pad_cnt = -skb->len & 3;
/* Includes ICRC */
payload_dwords = ((skb->len + pad_cnt) >> 2) + SIZE_OF_CRC;
/* header size in dwords LRH+BTH+DETH = (8+12+8)/4. */
txp->hdr_dwords = 7;
if (rdma_ah_get_ah_flags(ah_attr) & IB_AH_GRH) {
grh = &sdma_hdr->hdr.ibh.u.l.grh;
txp->hdr_dwords +=
hfi1_make_grh(txp->ibp,
grh,
rdma_ah_read_grh(ah_attr),
txp->hdr_dwords - LRH_9B_DWORDS,
payload_dwords);
lrh0 = HFI1_LRH_GRH;
ohdr = &sdma_hdr->hdr.ibh.u.l.oth;
} else {
lrh0 = HFI1_LRH_BTH;
ohdr = &sdma_hdr->hdr.ibh.u.oth;
}
lrh0 |= (rdma_ah_get_sl(ah_attr) & 0xf) << 4;
lrh0 |= (txp->flow.sc5 & 0xf) << 12;
dlid = opa_get_lid(rdma_ah_get_dlid(ah_attr), 9B);
if (dlid == be16_to_cpu(IB_LID_PERMISSIVE)) {
slid = be16_to_cpu(IB_LID_PERMISSIVE);
} else {
u16 lid = (u16)ppd->lid;
if (lid) {
lid |= rdma_ah_get_path_bits(ah_attr) &
((1 << ppd->lmc) - 1);
slid = lid;
} else {
slid = be16_to_cpu(IB_LID_PERMISSIVE);
}
}
/* Includes ICRC */
dwords = txp->hdr_dwords + payload_dwords;
/* Build the lrh */
sdma_hdr->hdr.hdr_type = HFI1_PKT_TYPE_9B;
hfi1_make_ib_hdr(&sdma_hdr->hdr.ibh, lrh0, dwords, dlid, slid);
/* Build the bth */
bth0 = (IB_OPCODE_UD_SEND_ONLY << 24) | (pad_cnt << 20) | priv->pkey;
ohdr->bth[0] = cpu_to_be32(bth0);
ohdr->bth[1] = cpu_to_be32(txp->dqpn);
ohdr->bth[2] = cpu_to_be32(mask_psn((u32)txp->txq->sent_txreqs));
/* Build the deth */
ohdr->u.ud.deth[0] = cpu_to_be32(priv->qkey);
ohdr->u.ud.deth[1] = cpu_to_be32((txp->entropy <<
HFI1_IPOIB_ENTROPY_SHIFT) | sqpn);
/* Construct the pbc. */
sdma_hdr->pbc =
cpu_to_le64(create_pbc(ppd,
ib_is_sc5(txp->flow.sc5) <<
PBC_DC_INFO_SHIFT,
0,
sc_to_vlt(priv->dd, txp->flow.sc5),
dwords - SIZE_OF_CRC +
(sizeof(sdma_hdr->pbc) >> 2)));
}
static struct ipoib_txreq *hfi1_ipoib_send_dma_common(struct net_device *dev,
struct sk_buff *skb,
struct ipoib_txparms *txp)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
struct ipoib_txreq *tx;
int ret;
tx = kmem_cache_alloc_node(priv->txreq_cache,
GFP_ATOMIC,
priv->dd->node);
if (unlikely(!tx))
return ERR_PTR(-ENOMEM);
/* so that we can test if the sdma decriptors are there */
tx->txreq.num_desc = 0;
tx->priv = priv;
tx->txq = txp->txq;
tx->skb = skb;
hfi1_ipoib_build_ib_tx_headers(tx, txp);
ret = hfi1_ipoib_build_tx_desc(tx, txp);
if (likely(!ret)) {
if (txp->txq->flow.as_int != txp->flow.as_int) {
txp->txq->flow.tx_queue = txp->flow.tx_queue;
txp->txq->flow.sc5 = txp->flow.sc5;
txp->txq->sde =
sdma_select_engine_sc(priv->dd,
txp->flow.tx_queue,
txp->flow.sc5);
}
return tx;
}
sdma_txclean(priv->dd, &tx->txreq);
kmem_cache_free(priv->txreq_cache, tx);
return ERR_PTR(ret);
}
static int hfi1_ipoib_submit_tx_list(struct net_device *dev,
struct hfi1_ipoib_txq *txq)
{
int ret;
u16 count_out;
ret = sdma_send_txlist(txq->sde,
iowait_get_ib_work(&txq->wait),
&txq->tx_list,
&count_out);
if (likely(!ret) || ret == -EBUSY || ret == -ECOMM)
return ret;
dd_dev_warn(txq->priv->dd, "cannot send skb tx list, err %d.\n", ret);
return ret;
}
static int hfi1_ipoib_flush_tx_list(struct net_device *dev,
struct hfi1_ipoib_txq *txq)
{
int ret = 0;
if (!list_empty(&txq->tx_list)) {
/* Flush the current list */
ret = hfi1_ipoib_submit_tx_list(dev, txq);
if (unlikely(ret))
if (ret != -EBUSY)
++dev->stats.tx_carrier_errors;
}
return ret;
}
static int hfi1_ipoib_submit_tx(struct hfi1_ipoib_txq *txq,
struct ipoib_txreq *tx)
{
int ret;
ret = sdma_send_txreq(txq->sde,
iowait_get_ib_work(&txq->wait),
&tx->txreq,
txq->pkts_sent);
if (likely(!ret)) {
txq->pkts_sent = true;
iowait_starve_clear(txq->pkts_sent, &txq->wait);
}
return ret;
}
static int hfi1_ipoib_send_dma_single(struct net_device *dev,
struct sk_buff *skb,
struct ipoib_txparms *txp)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
struct hfi1_ipoib_txq *txq = txp->txq;
struct ipoib_txreq *tx;
int ret;
tx = hfi1_ipoib_send_dma_common(dev, skb, txp);
if (IS_ERR(tx)) {
int ret = PTR_ERR(tx);
dev_kfree_skb_any(skb);
if (ret == -ENOMEM)
++dev->stats.tx_errors;
else
++dev->stats.tx_carrier_errors;
return NETDEV_TX_OK;
}
ret = hfi1_ipoib_submit_tx(txq, tx);
if (likely(!ret)) {
trace_sdma_output_ibhdr(tx->priv->dd,
&tx->sdma_hdr.hdr,
ib_is_sc5(txp->flow.sc5));
hfi1_ipoib_check_queue_depth(txq);
return NETDEV_TX_OK;
}
txq->pkts_sent = false;
if (ret == -EBUSY) {
list_add_tail(&tx->txreq.list, &txq->tx_list);
trace_sdma_output_ibhdr(tx->priv->dd,
&tx->sdma_hdr.hdr,
ib_is_sc5(txp->flow.sc5));
hfi1_ipoib_check_queue_depth(txq);
return NETDEV_TX_OK;
}
if (ret == -ECOMM) {
hfi1_ipoib_check_queue_depth(txq);
return NETDEV_TX_OK;
}
sdma_txclean(priv->dd, &tx->txreq);
dev_kfree_skb_any(skb);
kmem_cache_free(priv->txreq_cache, tx);
++dev->stats.tx_carrier_errors;
return NETDEV_TX_OK;
}
static int hfi1_ipoib_send_dma_list(struct net_device *dev,
struct sk_buff *skb,
struct ipoib_txparms *txp)
{
struct hfi1_ipoib_txq *txq = txp->txq;
struct ipoib_txreq *tx;
/* Has the flow change ? */
if (txq->flow.as_int != txp->flow.as_int)
(void)hfi1_ipoib_flush_tx_list(dev, txq);
tx = hfi1_ipoib_send_dma_common(dev, skb, txp);
if (IS_ERR(tx)) {
int ret = PTR_ERR(tx);
dev_kfree_skb_any(skb);
if (ret == -ENOMEM)
++dev->stats.tx_errors;
else
++dev->stats.tx_carrier_errors;
return NETDEV_TX_OK;
}
list_add_tail(&tx->txreq.list, &txq->tx_list);
hfi1_ipoib_check_queue_depth(txq);
trace_sdma_output_ibhdr(tx->priv->dd,
&tx->sdma_hdr.hdr,
ib_is_sc5(txp->flow.sc5));
if (!netdev_xmit_more())
(void)hfi1_ipoib_flush_tx_list(dev, txq);
return NETDEV_TX_OK;
}
static u8 hfi1_ipoib_calc_entropy(struct sk_buff *skb)
{
if (skb_transport_header_was_set(skb)) {
u8 *hdr = (u8 *)skb_transport_header(skb);
return (hdr[0] ^ hdr[1] ^ hdr[2] ^ hdr[3]);
}
return (u8)skb_get_queue_mapping(skb);
}
int hfi1_ipoib_send_dma(struct net_device *dev,
struct sk_buff *skb,
struct ib_ah *address,
u32 dqpn)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
struct ipoib_txparms txp;
struct rdma_netdev *rn = netdev_priv(dev);
if (unlikely(skb->len > rn->mtu + HFI1_IPOIB_ENCAP_LEN)) {
dd_dev_warn(priv->dd, "packet len %d (> %d) too long to send, dropping\n",
skb->len,
rn->mtu + HFI1_IPOIB_ENCAP_LEN);
++dev->stats.tx_dropped;
++dev->stats.tx_errors;
dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
}
txp.dd = priv->dd;
txp.ah_attr = &ibah_to_rvtah(address)->attr;
txp.ibp = to_iport(priv->device, priv->port_num);
txp.txq = &priv->txqs[skb_get_queue_mapping(skb)];
txp.dqpn = dqpn;
txp.flow.sc5 = txp.ibp->sl_to_sc[rdma_ah_get_sl(txp.ah_attr)];
txp.flow.tx_queue = (u8)skb_get_queue_mapping(skb);
txp.entropy = hfi1_ipoib_calc_entropy(skb);
if (netdev_xmit_more() || !list_empty(&txp.txq->tx_list))
return hfi1_ipoib_send_dma_list(dev, skb, &txp);
return hfi1_ipoib_send_dma_single(dev, skb, &txp);
}
/*
* hfi1_ipoib_sdma_sleep - ipoib sdma sleep function
*
* This function gets called from sdma_send_txreq() when there are not enough
* sdma descriptors available to send the packet. It adds Tx queue's wait
* structure to sdma engine's dmawait list to be woken up when descriptors
* become available.
*/
static int hfi1_ipoib_sdma_sleep(struct sdma_engine *sde,
struct iowait_work *wait,
struct sdma_txreq *txreq,
uint seq,
bool pkts_sent)
{
struct hfi1_ipoib_txq *txq =
container_of(wait->iow, struct hfi1_ipoib_txq, wait);
write_seqlock(&sde->waitlock);
if (likely(txq->priv->netdev->reg_state == NETREG_REGISTERED)) {
if (sdma_progress(sde, seq, txreq)) {
write_sequnlock(&sde->waitlock);
return -EAGAIN;
}
netif_stop_subqueue(txq->priv->netdev, txq->q_idx);
if (list_empty(&txq->wait.list))
iowait_queue(pkts_sent, wait->iow, &sde->dmawait);
write_sequnlock(&sde->waitlock);
return -EBUSY;
}
write_sequnlock(&sde->waitlock);
return -EINVAL;
}
/*
* hfi1_ipoib_sdma_wakeup - ipoib sdma wakeup function
*
* This function gets called when SDMA descriptors becomes available and Tx
* queue's wait structure was previously added to sdma engine's dmawait list.
*/
static void hfi1_ipoib_sdma_wakeup(struct iowait *wait, int reason)
{
struct hfi1_ipoib_txq *txq =
container_of(wait, struct hfi1_ipoib_txq, wait);
if (likely(txq->priv->netdev->reg_state == NETREG_REGISTERED))
iowait_schedule(wait, system_highpri_wq, WORK_CPU_UNBOUND);
}
static void hfi1_ipoib_flush_txq(struct work_struct *work)
{
struct iowait_work *ioww =
container_of(work, struct iowait_work, iowork);
struct iowait *wait = iowait_ioww_to_iow(ioww);
struct hfi1_ipoib_txq *txq =
container_of(wait, struct hfi1_ipoib_txq, wait);
struct net_device *dev = txq->priv->netdev;
if (likely(dev->reg_state == NETREG_REGISTERED) &&
likely(__netif_subqueue_stopped(dev, txq->q_idx)) &&
likely(!hfi1_ipoib_flush_tx_list(dev, txq)))
netif_wake_subqueue(dev, txq->q_idx);
}
int hfi1_ipoib_txreq_init(struct hfi1_ipoib_dev_priv *priv)
{
struct net_device *dev = priv->netdev;
char buf[HFI1_IPOIB_TXREQ_NAME_LEN];
unsigned long tx_ring_size;
int i;
/*
* Ring holds 1 less than tx_ring_size
* Round up to next power of 2 in order to hold at least tx_queue_len
*/
tx_ring_size = roundup_pow_of_two((unsigned long)dev->tx_queue_len + 1);
snprintf(buf, sizeof(buf), "hfi1_%u_ipoib_txreq_cache", priv->dd->unit);
priv->txreq_cache = kmem_cache_create(buf,
sizeof(struct ipoib_txreq),
0,
0,
NULL);
if (!priv->txreq_cache)
return -ENOMEM;
priv->tx_napis = kcalloc_node(dev->num_tx_queues,
sizeof(struct napi_struct),
GFP_ATOMIC,
priv->dd->node);
if (!priv->tx_napis)
goto free_txreq_cache;
priv->txqs = kcalloc_node(dev->num_tx_queues,
sizeof(struct hfi1_ipoib_txq),
GFP_ATOMIC,
priv->dd->node);
if (!priv->txqs)
goto free_tx_napis;
for (i = 0; i < dev->num_tx_queues; i++) {
struct hfi1_ipoib_txq *txq = &priv->txqs[i];
iowait_init(&txq->wait,
0,
hfi1_ipoib_flush_txq,
NULL,
hfi1_ipoib_sdma_sleep,
hfi1_ipoib_sdma_wakeup,
NULL,
NULL);
txq->priv = priv;
txq->sde = NULL;
INIT_LIST_HEAD(&txq->tx_list);
atomic64_set(&txq->complete_txreqs, 0);
txq->q_idx = i;
txq->flow.tx_queue = 0xff;
txq->flow.sc5 = 0xff;
txq->pkts_sent = false;
netdev_queue_numa_node_write(netdev_get_tx_queue(dev, i),
priv->dd->node);
txq->tx_ring.items =
vzalloc_node(array_size(tx_ring_size,
sizeof(struct ipoib_txreq)),
priv->dd->node);
if (!txq->tx_ring.items)
goto free_txqs;
spin_lock_init(&txq->tx_ring.producer_lock);
spin_lock_init(&txq->tx_ring.consumer_lock);
txq->tx_ring.max_items = tx_ring_size;
txq->napi = &priv->tx_napis[i];
netif_tx_napi_add(dev, txq->napi,
hfi1_ipoib_process_tx_ring,
NAPI_POLL_WEIGHT);
}
return 0;
free_txqs:
for (i--; i >= 0; i--) {
struct hfi1_ipoib_txq *txq = &priv->txqs[i];
netif_napi_del(txq->napi);
vfree(txq->tx_ring.items);
}
kfree(priv->txqs);
priv->txqs = NULL;
free_tx_napis:
kfree(priv->tx_napis);
priv->tx_napis = NULL;
free_txreq_cache:
kmem_cache_destroy(priv->txreq_cache);
priv->txreq_cache = NULL;
return -ENOMEM;
}
static void hfi1_ipoib_drain_tx_list(struct hfi1_ipoib_txq *txq)
{
struct sdma_txreq *txreq;
struct sdma_txreq *txreq_tmp;
atomic64_t *complete_txreqs = &txq->complete_txreqs;
list_for_each_entry_safe(txreq, txreq_tmp, &txq->tx_list, list) {
struct ipoib_txreq *tx =
container_of(txreq, struct ipoib_txreq, txreq);
list_del(&txreq->list);
sdma_txclean(txq->priv->dd, &tx->txreq);
dev_kfree_skb_any(tx->skb);
kmem_cache_free(txq->priv->txreq_cache, tx);
atomic64_inc(complete_txreqs);
}
if (hfi1_ipoib_txreqs(txq->sent_txreqs, atomic64_read(complete_txreqs)))
dd_dev_warn(txq->priv->dd,
"txq %d not empty found %llu requests\n",
txq->q_idx,
hfi1_ipoib_txreqs(txq->sent_txreqs,
atomic64_read(complete_txreqs)));
}
void hfi1_ipoib_txreq_deinit(struct hfi1_ipoib_dev_priv *priv)
{
int i;
for (i = 0; i < priv->netdev->num_tx_queues; i++) {
struct hfi1_ipoib_txq *txq = &priv->txqs[i];
iowait_cancel_work(&txq->wait);
iowait_sdma_drain(&txq->wait);
hfi1_ipoib_drain_tx_list(txq);
netif_napi_del(txq->napi);
(void)hfi1_ipoib_drain_tx_ring(txq, txq->tx_ring.max_items);
vfree(txq->tx_ring.items);
}
kfree(priv->txqs);
priv->txqs = NULL;
kfree(priv->tx_napis);
priv->tx_napis = NULL;
kmem_cache_destroy(priv->txreq_cache);
priv->txreq_cache = NULL;
}
void hfi1_ipoib_napi_tx_enable(struct net_device *dev)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
int i;
for (i = 0; i < dev->num_tx_queues; i++) {
struct hfi1_ipoib_txq *txq = &priv->txqs[i];
napi_enable(txq->napi);
}
}
void hfi1_ipoib_napi_tx_disable(struct net_device *dev)
{
struct hfi1_ipoib_dev_priv *priv = hfi1_ipoib_priv(dev);
int i;
for (i = 0; i < dev->num_tx_queues; i++) {
struct hfi1_ipoib_txq *txq = &priv->txqs[i];
napi_disable(txq->napi);
(void)hfi1_ipoib_drain_tx_ring(txq, txq->tx_ring.max_items);
}
}

View File

@ -1,6 +1,6 @@
// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
/*
* Copyright(c) 2018 Intel Corporation.
* Copyright(c) 2018 - 2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -49,6 +49,7 @@
#include "hfi.h"
#include "affinity.h"
#include "sdma.h"
#include "netdev.h"
/**
* msix_initialize() - Calculate, request and configure MSIx IRQs
@ -69,7 +70,7 @@ int msix_initialize(struct hfi1_devdata *dd)
* one for each VNIC context
* ...any new IRQs should be added here.
*/
total = 1 + dd->num_sdma + dd->n_krcv_queues + dd->num_vnic_contexts;
total = 1 + dd->num_sdma + dd->n_krcv_queues + dd->num_netdev_contexts;
if (total >= CCE_NUM_MSIX_VECTORS)
return -EINVAL;
@ -140,7 +141,7 @@ static int msix_request_irq(struct hfi1_devdata *dd, void *arg,
ret = pci_request_irq(dd->pcidev, nr, handler, thread, arg, name);
if (ret) {
dd_dev_err(dd,
"%s: request for IRQ %d failed, MSIx %lu, err %d\n",
"%s: request for IRQ %d failed, MSIx %lx, err %d\n",
name, irq, nr, ret);
spin_lock(&dd->msix_info.msix_lock);
__clear_bit(nr, dd->msix_info.in_use_msix);
@ -160,7 +161,7 @@ static int msix_request_irq(struct hfi1_devdata *dd, void *arg,
/* This is a request, so a failure is not fatal */
ret = hfi1_get_irq_affinity(dd, me);
if (ret)
dd_dev_err(dd, "unable to pin IRQ %d\n", ret);
dd_dev_err(dd, "%s: unable to pin IRQ %d\n", name, ret);
return nr;
}
@ -171,7 +172,8 @@ static int msix_request_rcd_irq_common(struct hfi1_ctxtdata *rcd,
const char *name)
{
int nr = msix_request_irq(rcd->dd, rcd, handler, thread,
IRQ_RCVCTXT, name);
rcd->is_vnic ? IRQ_NETDEVCTXT : IRQ_RCVCTXT,
name);
if (nr < 0)
return nr;
@ -203,6 +205,21 @@ int msix_request_rcd_irq(struct hfi1_ctxtdata *rcd)
receive_context_thread, name);
}
/**
* msix_request_rcd_irq() - Helper function for RCVAVAIL IRQs
* for netdev context
* @rcd: valid netdev contexti
*/
int msix_netdev_request_rcd_irq(struct hfi1_ctxtdata *rcd)
{
char name[MAX_NAME_SIZE];
snprintf(name, sizeof(name), DRIVER_NAME "_%d nd kctxt%d",
rcd->dd->unit, rcd->ctxt);
return msix_request_rcd_irq_common(rcd, receive_context_interrupt_napi,
NULL, name);
}
/**
* msix_request_smda_ira() - Helper for getting SDMA IRQ resources
* @sde: valid sdma engine
@ -355,15 +372,16 @@ void msix_clean_up_interrupts(struct hfi1_devdata *dd)
}
/**
* msix_vnic_syncrhonize_irq() - Vnic IRQ synchronize
* msix_netdev_syncrhonize_irq() - netdev IRQ synchronize
* @dd: valid devdata
*/
void msix_vnic_synchronize_irq(struct hfi1_devdata *dd)
void msix_netdev_synchronize_irq(struct hfi1_devdata *dd)
{
int i;
int ctxt_count = hfi1_netdev_ctxt_count(dd);
for (i = 0; i < dd->vnic.num_ctxt; i++) {
struct hfi1_ctxtdata *rcd = dd->vnic.ctxt[i];
for (i = 0; i < ctxt_count; i++) {
struct hfi1_ctxtdata *rcd = hfi1_netdev_get_ctxt(dd, i);
struct hfi1_msix_entry *me;
me = &dd->msix_info.msix_entries[rcd->msix_intr];

View File

@ -1,6 +1,6 @@
/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
/*
* Copyright(c) 2018 Intel Corporation.
* Copyright(c) 2018 - 2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -59,7 +59,8 @@ int msix_request_rcd_irq(struct hfi1_ctxtdata *rcd);
int msix_request_sdma_irq(struct sdma_engine *sde);
void msix_free_irq(struct hfi1_devdata *dd, u8 msix_intr);
/* VNIC interface */
void msix_vnic_synchronize_irq(struct hfi1_devdata *dd);
/* Netdev interface */
void msix_netdev_synchronize_irq(struct hfi1_devdata *dd);
int msix_netdev_request_rcd_irq(struct hfi1_ctxtdata *rcd);
#endif

View File

@ -0,0 +1,118 @@
/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
/*
* Copyright(c) 2020 Intel Corporation.
*
*/
#ifndef HFI1_NETDEV_H
#define HFI1_NETDEV_H
#include "hfi.h"
#include <linux/netdevice.h>
#include <linux/xarray.h>
/**
* struct hfi1_netdev_rxq - Receive Queue for HFI
* dummy netdev. Both IPoIB and VNIC netdevices will be working on
* top of this device.
* @napi: napi object
* @priv: ptr to netdev_priv
* @rcd: ptr to receive context data
*/
struct hfi1_netdev_rxq {
struct napi_struct napi;
struct hfi1_netdev_priv *priv;
struct hfi1_ctxtdata *rcd;
};
/*
* Number of netdev contexts used. Ensure it is less than or equal to
* max queues supported by VNIC (HFI1_VNIC_MAX_QUEUE).
*/
#define HFI1_MAX_NETDEV_CTXTS 8
/* Number of NETDEV RSM entries */
#define NUM_NETDEV_MAP_ENTRIES HFI1_MAX_NETDEV_CTXTS
/**
* struct hfi1_netdev_priv: data required to setup and run HFI netdev.
* @dd: hfi1_devdata
* @rxq: pointer to dummy netdev receive queues.
* @num_rx_q: number of receive queues
* @rmt_index: first free index in RMT Array
* @msix_start: first free MSI-X interrupt vector.
* @dev_tbl: netdev table for unique identifier VNIC and IPoIb VLANs.
* @enabled: atomic counter of netdevs enabling receive queues.
* When 0 NAPI will be disabled.
* @netdevs: atomic counter of netdevs using dummy netdev.
* When 0 receive queues will be freed.
*/
struct hfi1_netdev_priv {
struct hfi1_devdata *dd;
struct hfi1_netdev_rxq *rxq;
int num_rx_q;
int rmt_start;
struct xarray dev_tbl;
/* count of enabled napi polls */
atomic_t enabled;
/* count of netdevs on top */
atomic_t netdevs;
};
static inline
struct hfi1_netdev_priv *hfi1_netdev_priv(struct net_device *dev)
{
return (struct hfi1_netdev_priv *)&dev[1];
}
static inline
int hfi1_netdev_ctxt_count(struct hfi1_devdata *dd)
{
struct hfi1_netdev_priv *priv = hfi1_netdev_priv(dd->dummy_netdev);
return priv->num_rx_q;
}
static inline
struct hfi1_ctxtdata *hfi1_netdev_get_ctxt(struct hfi1_devdata *dd, int ctxt)
{
struct hfi1_netdev_priv *priv = hfi1_netdev_priv(dd->dummy_netdev);
return priv->rxq[ctxt].rcd;
}
static inline
int hfi1_netdev_get_free_rmt_idx(struct hfi1_devdata *dd)
{
struct hfi1_netdev_priv *priv = hfi1_netdev_priv(dd->dummy_netdev);
return priv->rmt_start;
}
static inline
void hfi1_netdev_set_free_rmt_idx(struct hfi1_devdata *dd, int rmt_idx)
{
struct hfi1_netdev_priv *priv = hfi1_netdev_priv(dd->dummy_netdev);
priv->rmt_start = rmt_idx;
}
u32 hfi1_num_netdev_contexts(struct hfi1_devdata *dd, u32 available_contexts,
struct cpumask *cpu_mask);
void hfi1_netdev_enable_queues(struct hfi1_devdata *dd);
void hfi1_netdev_disable_queues(struct hfi1_devdata *dd);
int hfi1_netdev_rx_init(struct hfi1_devdata *dd);
int hfi1_netdev_rx_destroy(struct hfi1_devdata *dd);
int hfi1_netdev_alloc(struct hfi1_devdata *dd);
void hfi1_netdev_free(struct hfi1_devdata *dd);
int hfi1_netdev_add_data(struct hfi1_devdata *dd, int id, void *data);
void *hfi1_netdev_remove_data(struct hfi1_devdata *dd, int id);
void *hfi1_netdev_get_data(struct hfi1_devdata *dd, int id);
void *hfi1_netdev_get_first_data(struct hfi1_devdata *dd, int *start_id);
/* chip.c */
int hfi1_netdev_rx_napi(struct napi_struct *napi, int budget);
#endif /* HFI1_NETDEV_H */

View File

@ -0,0 +1,481 @@
// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
/*
* Copyright(c) 2020 Intel Corporation.
*
*/
/*
* This file contains HFI1 support for netdev RX functionality
*/
#include "sdma.h"
#include "verbs.h"
#include "netdev.h"
#include "hfi.h"
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <rdma/ib_verbs.h>
static int hfi1_netdev_setup_ctxt(struct hfi1_netdev_priv *priv,
struct hfi1_ctxtdata *uctxt)
{
unsigned int rcvctrl_ops;
struct hfi1_devdata *dd = priv->dd;
int ret;
uctxt->rhf_rcv_function_map = netdev_rhf_rcv_functions;
uctxt->do_interrupt = &handle_receive_interrupt_napi_sp;
/* Now allocate the RcvHdr queue and eager buffers. */
ret = hfi1_create_rcvhdrq(dd, uctxt);
if (ret)
goto done;
ret = hfi1_setup_eagerbufs(uctxt);
if (ret)
goto done;
clear_rcvhdrtail(uctxt);
rcvctrl_ops = HFI1_RCVCTRL_CTXT_DIS;
rcvctrl_ops |= HFI1_RCVCTRL_INTRAVAIL_DIS;
if (!HFI1_CAP_KGET_MASK(uctxt->flags, MULTI_PKT_EGR))
rcvctrl_ops |= HFI1_RCVCTRL_ONE_PKT_EGR_ENB;
if (HFI1_CAP_KGET_MASK(uctxt->flags, NODROP_EGR_FULL))
rcvctrl_ops |= HFI1_RCVCTRL_NO_EGR_DROP_ENB;
if (HFI1_CAP_KGET_MASK(uctxt->flags, NODROP_RHQ_FULL))
rcvctrl_ops |= HFI1_RCVCTRL_NO_RHQ_DROP_ENB;
if (HFI1_CAP_KGET_MASK(uctxt->flags, DMA_RTAIL))
rcvctrl_ops |= HFI1_RCVCTRL_TAILUPD_ENB;
hfi1_rcvctrl(uctxt->dd, rcvctrl_ops, uctxt);
done:
return ret;
}
static int hfi1_netdev_allocate_ctxt(struct hfi1_devdata *dd,
struct hfi1_ctxtdata **ctxt)
{
struct hfi1_ctxtdata *uctxt;
int ret;
if (dd->flags & HFI1_FROZEN)
return -EIO;
ret = hfi1_create_ctxtdata(dd->pport, dd->node, &uctxt);
if (ret < 0) {
dd_dev_err(dd, "Unable to create ctxtdata, failing open\n");
return -ENOMEM;
}
uctxt->flags = HFI1_CAP_KGET(MULTI_PKT_EGR) |
HFI1_CAP_KGET(NODROP_RHQ_FULL) |
HFI1_CAP_KGET(NODROP_EGR_FULL) |
HFI1_CAP_KGET(DMA_RTAIL);
/* Netdev contexts are always NO_RDMA_RTAIL */
uctxt->fast_handler = handle_receive_interrupt_napi_fp;
uctxt->slow_handler = handle_receive_interrupt_napi_sp;
hfi1_set_seq_cnt(uctxt, 1);
uctxt->is_vnic = true;
hfi1_stats.sps_ctxts++;
dd_dev_info(dd, "created netdev context %d\n", uctxt->ctxt);
*ctxt = uctxt;
return 0;
}
static void hfi1_netdev_deallocate_ctxt(struct hfi1_devdata *dd,
struct hfi1_ctxtdata *uctxt)
{
flush_wc();
/*
* Disable receive context and interrupt available, reset all
* RcvCtxtCtrl bits to default values.
*/
hfi1_rcvctrl(dd, HFI1_RCVCTRL_CTXT_DIS |
HFI1_RCVCTRL_TIDFLOW_DIS |
HFI1_RCVCTRL_INTRAVAIL_DIS |
HFI1_RCVCTRL_ONE_PKT_EGR_DIS |
HFI1_RCVCTRL_NO_RHQ_DROP_DIS |
HFI1_RCVCTRL_NO_EGR_DROP_DIS, uctxt);
if (uctxt->msix_intr != CCE_NUM_MSIX_VECTORS)
msix_free_irq(dd, uctxt->msix_intr);
uctxt->msix_intr = CCE_NUM_MSIX_VECTORS;
uctxt->event_flags = 0;
hfi1_clear_tids(uctxt);
hfi1_clear_ctxt_pkey(dd, uctxt);
hfi1_stats.sps_ctxts--;
hfi1_free_ctxt(uctxt);
}
static int hfi1_netdev_allot_ctxt(struct hfi1_netdev_priv *priv,
struct hfi1_ctxtdata **ctxt)
{
int rc;
struct hfi1_devdata *dd = priv->dd;
rc = hfi1_netdev_allocate_ctxt(dd, ctxt);
if (rc) {
dd_dev_err(dd, "netdev ctxt alloc failed %d\n", rc);
return rc;
}
rc = hfi1_netdev_setup_ctxt(priv, *ctxt);
if (rc) {
dd_dev_err(dd, "netdev ctxt setup failed %d\n", rc);
hfi1_netdev_deallocate_ctxt(dd, *ctxt);
*ctxt = NULL;
}
return rc;
}
/**
* hfi1_num_netdev_contexts - Count of netdev recv contexts to use.
* @dd: device on which to allocate netdev contexts
* @available_contexts: count of available receive contexts
* @cpu_mask: mask of possible cpus to include for contexts
*
* Return: count of physical cores on a node or the remaining available recv
* contexts for netdev recv context usage up to the maximum of
* HFI1_MAX_NETDEV_CTXTS.
* A value of 0 can be returned when acceleration is explicitly turned off,
* a memory allocation error occurs or when there are no available contexts.
*
*/
u32 hfi1_num_netdev_contexts(struct hfi1_devdata *dd, u32 available_contexts,
struct cpumask *cpu_mask)
{
cpumask_var_t node_cpu_mask;
unsigned int available_cpus;
if (!HFI1_CAP_IS_KSET(AIP))
return 0;
/* Always give user contexts priority over netdev contexts */
if (available_contexts == 0) {
dd_dev_info(dd, "No receive contexts available for netdevs.\n");
return 0;
}
if (!zalloc_cpumask_var(&node_cpu_mask, GFP_KERNEL)) {
dd_dev_err(dd, "Unable to allocate cpu_mask for netdevs.\n");
return 0;
}
cpumask_and(node_cpu_mask, cpu_mask,
cpumask_of_node(pcibus_to_node(dd->pcidev->bus)));
available_cpus = cpumask_weight(node_cpu_mask);
free_cpumask_var(node_cpu_mask);
return min3(available_cpus, available_contexts,
(u32)HFI1_MAX_NETDEV_CTXTS);
}
static int hfi1_netdev_rxq_init(struct net_device *dev)
{
int i;
int rc;
struct hfi1_netdev_priv *priv = hfi1_netdev_priv(dev);
struct hfi1_devdata *dd = priv->dd;
priv->num_rx_q = dd->num_netdev_contexts;
priv->rxq = kcalloc_node(priv->num_rx_q, sizeof(struct hfi1_netdev_rxq),
GFP_KERNEL, dd->node);
if (!priv->rxq) {
dd_dev_err(dd, "Unable to allocate netdev queue data\n");
return (-ENOMEM);
}
for (i = 0; i < priv->num_rx_q; i++) {
struct hfi1_netdev_rxq *rxq = &priv->rxq[i];
rc = hfi1_netdev_allot_ctxt(priv, &rxq->rcd);
if (rc)
goto bail_context_irq_failure;
hfi1_rcd_get(rxq->rcd);
rxq->priv = priv;
rxq->rcd->napi = &rxq->napi;
dd_dev_info(dd, "Setting rcv queue %d napi to context %d\n",
i, rxq->rcd->ctxt);
/*
* Disable BUSY_POLL on this NAPI as this is not supported
* right now.
*/
set_bit(NAPI_STATE_NO_BUSY_POLL, &rxq->napi.state);
netif_napi_add(dev, &rxq->napi, hfi1_netdev_rx_napi, 64);
rc = msix_netdev_request_rcd_irq(rxq->rcd);
if (rc)
goto bail_context_irq_failure;
}
return 0;
bail_context_irq_failure:
dd_dev_err(dd, "Unable to allot receive context\n");
for (; i >= 0; i--) {
struct hfi1_netdev_rxq *rxq = &priv->rxq[i];
if (rxq->rcd) {
hfi1_netdev_deallocate_ctxt(dd, rxq->rcd);
hfi1_rcd_put(rxq->rcd);
rxq->rcd = NULL;
}
}
kfree(priv->rxq);
priv->rxq = NULL;
return rc;
}
static void hfi1_netdev_rxq_deinit(struct net_device *dev)
{
int i;
struct hfi1_netdev_priv *priv = hfi1_netdev_priv(dev);
struct hfi1_devdata *dd = priv->dd;
for (i = 0; i < priv->num_rx_q; i++) {
struct hfi1_netdev_rxq *rxq = &priv->rxq[i];
netif_napi_del(&rxq->napi);
hfi1_netdev_deallocate_ctxt(dd, rxq->rcd);
hfi1_rcd_put(rxq->rcd);
rxq->rcd = NULL;
}
kfree(priv->rxq);
priv->rxq = NULL;
priv->num_rx_q = 0;
}
static void enable_queues(struct hfi1_netdev_priv *priv)
{
int i;
for (i = 0; i < priv->num_rx_q; i++) {
struct hfi1_netdev_rxq *rxq = &priv->rxq[i];
dd_dev_info(priv->dd, "enabling queue %d on context %d\n", i,
rxq->rcd->ctxt);
napi_enable(&rxq->napi);
hfi1_rcvctrl(priv->dd,
HFI1_RCVCTRL_CTXT_ENB | HFI1_RCVCTRL_INTRAVAIL_ENB,
rxq->rcd);
}
}
static void disable_queues(struct hfi1_netdev_priv *priv)
{
int i;
msix_netdev_synchronize_irq(priv->dd);
for (i = 0; i < priv->num_rx_q; i++) {
struct hfi1_netdev_rxq *rxq = &priv->rxq[i];
dd_dev_info(priv->dd, "disabling queue %d on context %d\n", i,
rxq->rcd->ctxt);
/* wait for napi if it was scheduled */
hfi1_rcvctrl(priv->dd,
HFI1_RCVCTRL_CTXT_DIS | HFI1_RCVCTRL_INTRAVAIL_DIS,
rxq->rcd);
napi_synchronize(&rxq->napi);
napi_disable(&rxq->napi);
}
}
/**
* hfi1_netdev_rx_init - Incrememnts netdevs counter. When called first time,
* it allocates receive queue data and calls netif_napi_add
* for each queue.
*
* @dd: hfi1 dev data
*/
int hfi1_netdev_rx_init(struct hfi1_devdata *dd)
{
struct hfi1_netdev_priv *priv = hfi1_netdev_priv(dd->dummy_netdev);
int res;
if (atomic_fetch_inc(&priv->netdevs))
return 0;
mutex_lock(&hfi1_mutex);
init_dummy_netdev(dd->dummy_netdev);
res = hfi1_netdev_rxq_init(dd->dummy_netdev);
mutex_unlock(&hfi1_mutex);
return res;
}
/**
* hfi1_netdev_rx_destroy - Decrements netdevs counter, when it reaches 0
* napi is deleted and receive queses memory is freed.
*
* @dd: hfi1 dev data
*/
int hfi1_netdev_rx_destroy(struct hfi1_devdata *dd)
{
struct hfi1_netdev_priv *priv = hfi1_netdev_priv(dd->dummy_netdev);
/* destroy the RX queues only if it is the last netdev going away */
if (atomic_fetch_add_unless(&priv->netdevs, -1, 0) == 1) {
mutex_lock(&hfi1_mutex);
hfi1_netdev_rxq_deinit(dd->dummy_netdev);
mutex_unlock(&hfi1_mutex);
}
return 0;
}
/**
* hfi1_netdev_alloc - Allocates netdev and private data. It is required
* because RMT index and MSI-X interrupt can be set only
* during driver initialization.
*
* @dd: hfi1 dev data
*/
int hfi1_netdev_alloc(struct hfi1_devdata *dd)
{
struct hfi1_netdev_priv *priv;
const int netdev_size = sizeof(*dd->dummy_netdev) +
sizeof(struct hfi1_netdev_priv);
dd_dev_info(dd, "allocating netdev size %d\n", netdev_size);
dd->dummy_netdev = kcalloc_node(1, netdev_size, GFP_KERNEL, dd->node);
if (!dd->dummy_netdev)
return -ENOMEM;
priv = hfi1_netdev_priv(dd->dummy_netdev);
priv->dd = dd;
xa_init(&priv->dev_tbl);
atomic_set(&priv->enabled, 0);
atomic_set(&priv->netdevs, 0);
return 0;
}
void hfi1_netdev_free(struct hfi1_devdata *dd)
{
if (dd->dummy_netdev) {
dd_dev_info(dd, "hfi1 netdev freed\n");
free_netdev(dd->dummy_netdev);
dd->dummy_netdev = NULL;
}
}
/**
* hfi1_netdev_enable_queues - This is napi enable function.
* It enables napi objects associated with queues.
* When at least one device has called it it increments atomic counter.
* Disable function decrements counter and when it is 0,
* calls napi_disable for every queue.
*
* @dd: hfi1 dev data
*/
void hfi1_netdev_enable_queues(struct hfi1_devdata *dd)
{
struct hfi1_netdev_priv *priv;
if (!dd->dummy_netdev)
return;
priv = hfi1_netdev_priv(dd->dummy_netdev);
if (atomic_fetch_inc(&priv->enabled))
return;
mutex_lock(&hfi1_mutex);
enable_queues(priv);
mutex_unlock(&hfi1_mutex);
}
void hfi1_netdev_disable_queues(struct hfi1_devdata *dd)
{
struct hfi1_netdev_priv *priv;
if (!dd->dummy_netdev)
return;
priv = hfi1_netdev_priv(dd->dummy_netdev);
if (atomic_dec_if_positive(&priv->enabled))
return;
mutex_lock(&hfi1_mutex);
disable_queues(priv);
mutex_unlock(&hfi1_mutex);
}
/**
* hfi1_netdev_add_data - Registers data with unique identifier
* to be requested later this is needed for VNIC and IPoIB VLANs
* implementations.
* This call is protected by mutex idr_lock.
*
* @dd: hfi1 dev data
* @id: requested integer id up to INT_MAX
* @data: data to be associated with index
*/
int hfi1_netdev_add_data(struct hfi1_devdata *dd, int id, void *data)
{
struct hfi1_netdev_priv *priv = hfi1_netdev_priv(dd->dummy_netdev);
return xa_insert(&priv->dev_tbl, id, data, GFP_NOWAIT);
}
/**
* hfi1_netdev_remove_data - Removes data with previously given id.
* Returns the reference to removed entry.
*
* @dd: hfi1 dev data
* @id: requested integer id up to INT_MAX
*/
void *hfi1_netdev_remove_data(struct hfi1_devdata *dd, int id)
{
struct hfi1_netdev_priv *priv = hfi1_netdev_priv(dd->dummy_netdev);
return xa_erase(&priv->dev_tbl, id);
}
/**
* hfi1_netdev_get_data - Gets data with given id
*
* @dd: hfi1 dev data
* @id: requested integer id up to INT_MAX
*/
void *hfi1_netdev_get_data(struct hfi1_devdata *dd, int id)
{
struct hfi1_netdev_priv *priv = hfi1_netdev_priv(dd->dummy_netdev);
return xa_load(&priv->dev_tbl, id);
}
/**
* hfi1_netdev_get_first_dat - Gets first entry with greater or equal id.
*
* @dd: hfi1 dev data
* @id: requested integer id up to INT_MAX
*/
void *hfi1_netdev_get_first_data(struct hfi1_devdata *dd, int *start_id)
{
struct hfi1_netdev_priv *priv = hfi1_netdev_priv(dd->dummy_netdev);
unsigned long index = *start_id;
void *ret;
ret = xa_find(&priv->dev_tbl, &index, UINT_MAX, XA_PRESENT);
*start_id = (int)index;
return ret;
}

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015 - 2019 Intel Corporation.
* Copyright(c) 2015 - 2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -186,15 +186,6 @@ static void flush_iowait(struct rvt_qp *qp)
write_sequnlock_irqrestore(lock, flags);
}
static inline int opa_mtu_enum_to_int(int mtu)
{
switch (mtu) {
case OPA_MTU_8192: return 8192;
case OPA_MTU_10240: return 10240;
default: return -1;
}
}
/**
* This function is what we would push to the core layer if we wanted to be a
* "first class citizen". Instead we hide this here and rely on Verbs ULPs
@ -202,15 +193,10 @@ static inline int opa_mtu_enum_to_int(int mtu)
*/
static inline int verbs_mtu_enum_to_int(struct ib_device *dev, enum ib_mtu mtu)
{
int val;
/* Constraining 10KB packets to 8KB packets */
if (mtu == (enum ib_mtu)OPA_MTU_10240)
mtu = OPA_MTU_8192;
val = opa_mtu_enum_to_int((int)mtu);
if (val > 0)
return val;
return ib_mtu_enum_to_int(mtu);
return opa_mtu_enum_to_int((enum opa_mtu)mtu);
}
int hfi1_check_modify_qp(struct rvt_qp *qp, struct ib_qp_attr *attr,

View File

@ -1,6 +1,6 @@
// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
/*
* Copyright(c) 2018 Intel Corporation.
* Copyright(c) 2018 - 2020 Intel Corporation.
*
*/
@ -194,7 +194,7 @@ void tid_rdma_opfn_init(struct rvt_qp *qp, struct tid_rdma_params *p)
{
struct hfi1_qp_priv *priv = qp->priv;
p->qp = (kdeth_qp << 16) | priv->rcd->ctxt;
p->qp = (RVT_KDETH_QP_PREFIX << 16) | priv->rcd->ctxt;
p->max_len = TID_RDMA_MAX_SEGMENT_SIZE;
p->jkey = priv->rcd->jkey;
p->max_read = TID_RDMA_MAX_READ_SEGS_PER_REQ;

View File

@ -1,5 +1,5 @@
/*
* Copyright(c) 2015 - 2018 Intel Corporation.
* Copyright(c) 2015 - 2020 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@ -47,6 +47,7 @@
#define CREATE_TRACE_POINTS
#include "trace.h"
#include "exp_rcv.h"
#include "ipoib.h"
static u8 __get_ib_hdr_len(struct ib_header *hdr)
{
@ -126,6 +127,7 @@ const char *hfi1_trace_get_packet_l2_str(u8 l2)
#define RETH_PRN "reth vaddr:0x%.16llx rkey:0x%.8x dlen:0x%.8x"
#define AETH_PRN "aeth syn:0x%.2x %s msn:0x%.8x"
#define DETH_PRN "deth qkey:0x%.8x sqpn:0x%.6x"
#define DETH_ENTROPY_PRN "deth qkey:0x%.8x sqpn:0x%.6x entropy:0x%.2x"
#define IETH_PRN "ieth rkey:0x%.8x"
#define ATOMICACKETH_PRN "origdata:%llx"
#define ATOMICETH_PRN "vaddr:0x%llx rkey:0x%.8x sdata:%llx cdata:%llx"
@ -444,6 +446,12 @@ const char *parse_everbs_hdrs(
break;
/* deth */
case OP(UD, SEND_ONLY):
trace_seq_printf(p, DETH_ENTROPY_PRN,
be32_to_cpu(eh->ud.deth[0]),
be32_to_cpu(eh->ud.deth[1]) & RVT_QPN_MASK,
be32_to_cpu(eh->ud.deth[1]) >>
HFI1_IPOIB_ENTROPY_SHIFT);
break;
case OP(UD, SEND_ONLY_WITH_IMMEDIATE):
trace_seq_printf(p, DETH_PRN,
be32_to_cpu(eh->ud.deth[0]),
@ -512,6 +520,38 @@ u16 hfi1_trace_get_tid_idx(u32 ent)
return EXP_TID_GET(ent, IDX);
}
struct hfi1_ctxt_hist {
atomic_t count;
atomic_t data[255];
};
struct hfi1_ctxt_hist hist = {
.count = ATOMIC_INIT(0)
};
const char *hfi1_trace_print_rsm_hist(struct trace_seq *p, unsigned int ctxt)
{
int i, len = ARRAY_SIZE(hist.data);
const char *ret = trace_seq_buffer_ptr(p);
unsigned long packet_count = atomic_fetch_inc(&hist.count);
trace_seq_printf(p, "packet[%lu]", packet_count);
for (i = 0; i < len; ++i) {
unsigned long val;
atomic_t *count = &hist.data[i];
if (ctxt == i)
val = atomic_fetch_inc(count);
else
val = atomic_read(count);
if (val)
trace_seq_printf(p, "(%d:%lu)", i, val);
}
trace_seq_putc(p, 0);
return ret;
}
__hfi1_trace_fn(AFFINITY);
__hfi1_trace_fn(PKT);
__hfi1_trace_fn(PROC);

Some files were not shown because too many files have changed in this diff Show More