This patch introduces the SCIF documentation in the header file and describes the IOCTL interface for user mode. mic_overview.txt is updated with documentation on SCIF and a new document describing SCIF in more details is available in scif_overview.txt. Reviewed-by: Nikhil Rao <nikhil.rao@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
		
			
				
	
	
		
			99 lines
		
	
	
		
			3.9 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			99 lines
		
	
	
		
			3.9 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a low
 | 
						|
level communications API across PCIe currently implemented for MIC. Currently
 | 
						|
SCIF provides inter-node communication within a single host platform, where a
 | 
						|
node is a MIC Coprocessor or Xeon based host. SCIF abstracts the details of
 | 
						|
communicating over the PCIe bus while providing an API that is symmetric
 | 
						|
across all the nodes in the PCIe network. An important design objective for SCIF
 | 
						|
is to deliver the maximum possible performance given the communication
 | 
						|
abilities of the hardware. SCIF has been used to implement an offload compiler
 | 
						|
runtime and OFED support for MPI implementations for MIC coprocessors.
 | 
						|
 | 
						|
==== SCIF API Components ====
 | 
						|
The SCIF API has the following parts:
 | 
						|
1. Connection establishment using a client server model
 | 
						|
2. Byte stream messaging intended for short messages
 | 
						|
3. Node enumeration to determine online nodes
 | 
						|
4. Poll semantics for detection of incoming connections and messages
 | 
						|
5. Memory registration to pin down pages
 | 
						|
6. Remote memory mapping for low latency CPU accesses via mmap
 | 
						|
7. Remote DMA (RDMA) for high bandwidth DMA transfers
 | 
						|
8. Fence APIs for RDMA synchronization
 | 
						|
 | 
						|
SCIF exposes the notion of a connection which can be used by peer processes on
 | 
						|
nodes in a SCIF PCIe "network" to share memory "windows" and to communicate. A
 | 
						|
process in a SCIF node initiates a SCIF connection to a peer process on a
 | 
						|
different node via a SCIF "endpoint". SCIF endpoints support messaging APIs
 | 
						|
which are similar to connection oriented socket APIs. Connected SCIF endpoints
 | 
						|
can also register local memory which is followed by data transfer using either
 | 
						|
DMA, CPU copies or remote memory mapping via mmap. SCIF supports both user and
 | 
						|
kernel mode clients which are functionally equivalent.
 | 
						|
 | 
						|
==== SCIF Performance for MIC ====
 | 
						|
DMA bandwidth comparison between the TCP (over ethernet over PCIe) stack versus
 | 
						|
SCIF shows the performance advantages of SCIF for HPC applications and runtimes.
 | 
						|
 | 
						|
             Comparison of TCP and SCIF based BW
 | 
						|
 | 
						|
  Throughput (GB/sec)
 | 
						|
    8 +                                             PCIe Bandwidth ******
 | 
						|
      +                                                        TCP ######
 | 
						|
    7 +    **************************************             SCIF %%%%%%
 | 
						|
      |                       %%%%%%%%%%%%%%%%%%%
 | 
						|
    6 +                   %%%%
 | 
						|
      |                 %%
 | 
						|
      |               %%%
 | 
						|
    5 +              %%
 | 
						|
      |            %%
 | 
						|
    4 +           %%
 | 
						|
      |          %%
 | 
						|
    3 +         %%
 | 
						|
      |        %
 | 
						|
    2 +      %%
 | 
						|
      |     %%
 | 
						|
      |    %
 | 
						|
    1 +
 | 
						|
      +    ######################################
 | 
						|
    0 +++---+++--+--+-+--+--+-++-+--+-++-+--+-++-+-
 | 
						|
      1       10     100      1000   10000   100000
 | 
						|
                   Transfer Size (KBytes)
 | 
						|
 | 
						|
SCIF allows memory sharing via mmap(..) between processes on different PCIe
 | 
						|
nodes and thus provides bare-metal PCIe latency. The round trip SCIF mmap
 | 
						|
latency from the host to an x100 MIC for an 8 byte message is 0.44 usecs.
 | 
						|
 | 
						|
SCIF has a user space library which is a thin IOCTL wrapper providing a user
 | 
						|
space API similar to the kernel API in scif.h. The SCIF user space library
 | 
						|
is distributed @ https://software.intel.com/en-us/mic-developer
 | 
						|
 | 
						|
Here is some pseudo code for an example of how two applications on two PCIe
 | 
						|
nodes would typically use the SCIF API:
 | 
						|
 | 
						|
Process A (on node A)			Process B (on node B)
 | 
						|
 | 
						|
/* get online node information */
 | 
						|
scif_get_node_ids(..)			scif_get_node_ids(..)
 | 
						|
scif_open(..)				scif_open(..)
 | 
						|
scif_bind(..)				scif_bind(..)
 | 
						|
scif_listen(..)
 | 
						|
scif_accept(..)				scif_connect(..)
 | 
						|
/* SCIF connection established */
 | 
						|
 | 
						|
/* Send and receive short messages */
 | 
						|
scif_send(..)/scif_recv(..)		scif_send(..)/scif_recv(..)
 | 
						|
 | 
						|
/* Register memory */
 | 
						|
scif_register(..)			scif_register(..)
 | 
						|
 | 
						|
/* RDMA */
 | 
						|
scif_readfrom(..)/scif_writeto(..)	scif_readfrom(..)/scif_writeto(..)
 | 
						|
 | 
						|
/* Fence DMAs */
 | 
						|
scif_fence_signal(..)			scif_fence_signal(..)
 | 
						|
 | 
						|
mmap(..)				mmap(..)
 | 
						|
 | 
						|
/* Access remote registered memory */
 | 
						|
 | 
						|
/* Close the endpoints */
 | 
						|
scif_close(..)				scif_close(..)
 |