forked from Minki/linux
97162a1ee8
The InfiniBand docs are plain text with no markups. So, all we needed to do were to add the title markups and some markup sequences in order to properly parse tables, lists and literal blocks. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
160 lines
7.4 KiB
ReStructuredText
160 lines
7.4 KiB
ReStructuredText
=================================================================
|
|
Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC)
|
|
=================================================================
|
|
|
|
Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature
|
|
supports Ethernet functionality over Omni-Path fabric by encapsulating
|
|
the Ethernet packets between HFI nodes.
|
|
|
|
Architecture
|
|
=============
|
|
The patterns of exchanges of Omni-Path encapsulated Ethernet packets
|
|
involves one or more virtual Ethernet switches overlaid on the Omni-Path
|
|
fabric topology. A subset of HFI nodes on the Omni-Path fabric are
|
|
permitted to exchange encapsulated Ethernet packets across a particular
|
|
virtual Ethernet switch. The virtual Ethernet switches are logical
|
|
abstractions achieved by configuring the HFI nodes on the fabric for
|
|
header generation and processing. In the simplest configuration all HFI
|
|
nodes across the fabric exchange encapsulated Ethernet packets over a
|
|
single virtual Ethernet switch. A virtual Ethernet switch, is effectively
|
|
an independent Ethernet network. The configuration is performed by an
|
|
Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM)
|
|
application. HFI nodes can have multiple VNICs each connected to a
|
|
different virtual Ethernet switch. The below diagram presents a case
|
|
of two virtual Ethernet switches with two HFI nodes::
|
|
|
|
+-------------------+
|
|
| Subnet/ |
|
|
| Ethernet |
|
|
| Manager |
|
|
+-------------------+
|
|
/ /
|
|
/ /
|
|
/ /
|
|
/ /
|
|
+-----------------------------+ +------------------------------+
|
|
| Virtual Ethernet Switch | | Virtual Ethernet Switch |
|
|
| +---------+ +---------+ | | +---------+ +---------+ |
|
|
| | VPORT | | VPORT | | | | VPORT | | VPORT | |
|
|
+--+---------+----+---------+-+ +-+---------+----+---------+---+
|
|
| \ / |
|
|
| \ / |
|
|
| \/ |
|
|
| / \ |
|
|
| / \ |
|
|
+-----------+------------+ +-----------+------------+
|
|
| VNIC | VNIC | | VNIC | VNIC |
|
|
+-----------+------------+ +-----------+------------+
|
|
| HFI | | HFI |
|
|
+------------------------+ +------------------------+
|
|
|
|
|
|
The Omni-Path encapsulated Ethernet packet format is as described below.
|
|
|
|
==================== ================================
|
|
Bits Field
|
|
==================== ================================
|
|
Quad Word 0:
|
|
0-19 SLID (lower 20 bits)
|
|
20-30 Length (in Quad Words)
|
|
31 BECN bit
|
|
32-51 DLID (lower 20 bits)
|
|
52-56 SC (Service Class)
|
|
57-59 RC (Routing Control)
|
|
60 FECN bit
|
|
61-62 L2 (=10, 16B format)
|
|
63 LT (=1, Link Transfer Head Flit)
|
|
|
|
Quad Word 1:
|
|
0-7 L4 type (=0x78 ETHERNET)
|
|
8-11 SLID[23:20]
|
|
12-15 DLID[23:20]
|
|
16-31 PKEY
|
|
32-47 Entropy
|
|
48-63 Reserved
|
|
|
|
Quad Word 2:
|
|
0-15 Reserved
|
|
16-31 L4 header
|
|
32-63 Ethernet Packet
|
|
|
|
Quad Words 3 to N-1:
|
|
0-63 Ethernet packet (pad extended)
|
|
|
|
Quad Word N (last):
|
|
0-23 Ethernet packet (pad extended)
|
|
24-55 ICRC
|
|
56-61 Tail
|
|
62-63 LT (=01, Link Transfer Tail Flit)
|
|
==================== ================================
|
|
|
|
Ethernet packet is padded on the transmit side to ensure that the VNIC OPA
|
|
packet is quad word aligned. The 'Tail' field contains the number of bytes
|
|
padded. On the receive side the 'Tail' field is read and the padding is
|
|
removed (along with ICRC, Tail and OPA header) before passing packet up
|
|
the network stack.
|
|
|
|
The L4 header field contains the virtual Ethernet switch id the VNIC port
|
|
belongs to. On the receive side, this field is used to de-multiplex the
|
|
received VNIC packets to different VNIC ports.
|
|
|
|
Driver Design
|
|
==============
|
|
Intel OPA VNIC software design is presented in the below diagram.
|
|
OPA VNIC functionality has a HW dependent component and a HW
|
|
independent component.
|
|
|
|
The support has been added for IB device to allocate and free the RDMA
|
|
netdev devices. The RDMA netdev supports interfacing with the network
|
|
stack thus creating standard network interfaces. OPA_VNIC is an RDMA
|
|
netdev device type.
|
|
|
|
The HW dependent VNIC functionality is part of the HFI1 driver. It
|
|
implements the verbs to allocate and free the OPA_VNIC RDMA netdev.
|
|
It involves HW resource allocation/management for VNIC functionality.
|
|
It interfaces with the network stack and implements the required
|
|
net_device_ops functions. It expects Omni-Path encapsulated Ethernet
|
|
packets in the transmit path and provides HW access to them. It strips
|
|
the Omni-Path header from the received packets before passing them up
|
|
the network stack. It also implements the RDMA netdev control operations.
|
|
|
|
The OPA VNIC module implements the HW independent VNIC functionality.
|
|
It consists of two parts. The VNIC Ethernet Management Agent (VEMA)
|
|
registers itself with IB core as an IB client and interfaces with the
|
|
IB MAD stack. It exchanges the management information with the Ethernet
|
|
Manager (EM) and the VNIC netdev. The VNIC netdev part allocates and frees
|
|
the OPA_VNIC RDMA netdev devices. It overrides the net_device_ops functions
|
|
set by HW dependent VNIC driver where required to accommodate any control
|
|
operation. It also handles the encapsulation of Ethernet packets with an
|
|
Omni-Path header in the transmit path. For each VNIC interface, the
|
|
information required for encapsulation is configured by the EM via VEMA MAD
|
|
interface. It also passes any control information to the HW dependent driver
|
|
by invoking the RDMA netdev control operations::
|
|
|
|
+-------------------+ +----------------------+
|
|
| | | Linux |
|
|
| IB MAD | | Network |
|
|
| | | Stack |
|
|
+-------------------+ +----------------------+
|
|
| | |
|
|
| | |
|
|
+----------------------------+ |
|
|
| | |
|
|
| OPA VNIC Module | |
|
|
| (OPA VNIC RDMA Netdev | |
|
|
| & EMA functions) | |
|
|
| | |
|
|
+----------------------------+ |
|
|
| |
|
|
| |
|
|
+------------------+ |
|
|
| IB core | |
|
|
+------------------+ |
|
|
| |
|
|
| |
|
|
+--------------------------------------------+
|
|
| |
|
|
| HFI1 Driver with VNIC support |
|
|
| |
|
|
+--------------------------------------------+
|