Instead of storing the concepts dictionary inside header file, move it to the subsystem documentation. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
		
			
				
	
	
		
			179 lines
		
	
	
		
			6.1 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			179 lines
		
	
	
		
			6.1 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| Error Detection And Correction (EDAC) Devices
 | |
| =============================================
 | |
| 
 | |
| Main Concepts used at the EDAC subsystem
 | |
| ----------------------------------------
 | |
| 
 | |
| There are several things to be aware of that aren't at all obvious, like
 | |
| *sockets, *socket sets*, *banks*, *rows*, *chip-select rows*, *channels*,
 | |
| etc...
 | |
| 
 | |
| These are some of the many terms that are thrown about that don't always
 | |
| mean what people think they mean (Inconceivable!).  In the interest of
 | |
| creating a common ground for discussion, terms and their definitions
 | |
| will be established.
 | |
| 
 | |
| * Memory devices
 | |
| 
 | |
| The individual DRAM chips on a memory stick.  These devices commonly
 | |
| output 4 and 8 bits each (x4, x8). Grouping several of these in parallel
 | |
| provides the number of bits that the memory controller expects:
 | |
| typically 72 bits, in order to provide 64 bits + 8 bits of ECC data.
 | |
| 
 | |
| * Memory Stick
 | |
| 
 | |
| A printed circuit board that aggregates multiple memory devices in
 | |
| parallel.  In general, this is the Field Replaceable Unit (FRU) which
 | |
| gets replaced, in the case of excessive errors. Most often it is also
 | |
| called DIMM (Dual Inline Memory Module).
 | |
| 
 | |
| * Memory Socket
 | |
| 
 | |
| A physical connector on the motherboard that accepts a single memory
 | |
| stick. Also called as "slot" on several datasheets.
 | |
| 
 | |
| * Channel
 | |
| 
 | |
| A memory controller channel, responsible to communicate with a group of
 | |
| DIMMs. Each channel has its own independent control (command) and data
 | |
| bus, and can be used independently or grouped with other channels.
 | |
| 
 | |
| * Branch
 | |
| 
 | |
| It is typically the highest hierarchy on a Fully-Buffered DIMM memory
 | |
| controller. Typically, it contains two channels. Two channels at the
 | |
| same branch can be used in single mode or in lockstep mode. When
 | |
| lockstep is enabled, the cacheline is doubled, but it generally brings
 | |
| some performance penalty. Also, it is generally not possible to point to
 | |
| just one memory stick when an error occurs, as the error correction code
 | |
| is calculated using two DIMMs instead of one. Due to that, it is capable
 | |
| of correcting more errors than on single mode.
 | |
| 
 | |
| * Single-channel
 | |
| 
 | |
| The data accessed by the memory controller is contained into one dimm
 | |
| only. E. g. if the data is 64 bits-wide, the data flows to the CPU using
 | |
| one 64 bits parallel access. Typically used with SDR, DDR, DDR2 and DDR3
 | |
| memories. FB-DIMM and RAMBUS use a different concept for channel, so
 | |
| this concept doesn't apply there.
 | |
| 
 | |
| * Double-channel
 | |
| 
 | |
| The data size accessed by the memory controller is interlaced into two
 | |
| dimms, accessed at the same time. E. g. if the DIMM is 64 bits-wide (72
 | |
| bits with ECC), the data flows to the CPU using a 128 bits parallel
 | |
| access.
 | |
| 
 | |
| * Chip-select row
 | |
| 
 | |
| This is the name of the DRAM signal used to select the DRAM ranks to be
 | |
| accessed. Common chip-select rows for single channel are 64 bits, for
 | |
| dual channel 128 bits. It may not be visible by the memory controller,
 | |
| as some DIMM types have a memory buffer that can hide direct access to
 | |
| it from the Memory Controller.
 | |
| 
 | |
| * Single-Ranked stick
 | |
| 
 | |
| A Single-ranked stick has 1 chip-select row of memory. Motherboards
 | |
| commonly drive two chip-select pins to a memory stick. A single-ranked
 | |
| stick, will occupy only one of those rows. The other will be unused.
 | |
| 
 | |
| .. _doubleranked:
 | |
| 
 | |
| * Double-Ranked stick
 | |
| 
 | |
| A double-ranked stick has two chip-select rows which access different
 | |
| sets of memory devices.  The two rows cannot be accessed concurrently.
 | |
| 
 | |
| * Double-sided stick
 | |
| 
 | |
| **DEPRECATED TERM**, see :ref:`Double-Ranked stick <doubleranked>`.
 | |
| 
 | |
| A double-sided stick has two chip-select rows which access different sets
 | |
| of memory devices. The two rows cannot be accessed concurrently.
 | |
| "Double-sided" is irrespective of the memory devices being mounted on
 | |
| both sides of the memory stick.
 | |
| 
 | |
| * Socket set
 | |
| 
 | |
| All of the memory sticks that are required for a single memory access or
 | |
| all of the memory sticks spanned by a chip-select row.  A single socket
 | |
| set has two chip-select rows and if double-sided sticks are used these
 | |
| will occupy those chip-select rows.
 | |
| 
 | |
| * Bank
 | |
| 
 | |
| This term is avoided because it is unclear when needing to distinguish
 | |
| between chip-select rows and socket sets.
 | |
| 
 | |
| 
 | |
| Memory Controllers
 | |
| ------------------
 | |
| 
 | |
| Most of the EDAC core is focused on doing Memory Controller error detection.
 | |
| The :c:func:`edac_mc_alloc`. It uses internally the struct ``mem_ctl_info``
 | |
| to describe the memory controllers, with is an opaque struct for the EDAC
 | |
| drivers. Only the EDAC core is allowed to touch it.
 | |
| 
 | |
| .. kernel-doc:: include/linux/edac.h
 | |
| 
 | |
| .. kernel-doc:: drivers/edac/edac_mc.h
 | |
| 
 | |
| PCI Controllers
 | |
| ---------------
 | |
| 
 | |
| The EDAC subsystem provides a mechanism to handle PCI controllers by calling
 | |
| the :c:func:`edac_pci_alloc_ctl_info`. It will use the struct
 | |
| :c:type:`edac_pci_ctl_info` to describe the PCI controllers.
 | |
| 
 | |
| .. kernel-doc:: drivers/edac/edac_pci.h
 | |
| 
 | |
| EDAC Blocks
 | |
| -----------
 | |
| 
 | |
| The EDAC subsystem also provides a generic mechanism to report errors on
 | |
| other parts of the hardware via :c:func:`edac_device_alloc_ctl_info` function.
 | |
| 
 | |
| The structures :c:type:`edac_dev_sysfs_block_attribute`,
 | |
| :c:type:`edac_device_block`, :c:type:`edac_device_instance` and
 | |
| :c:type:`edac_device_ctl_info` provide a generic or abstract 'edac_device'
 | |
| representation at sysfs.
 | |
| 
 | |
| This set of structures and the code that implements the APIs for the same, provide for registering EDAC type devices which are NOT standard memory or
 | |
| PCI, like:
 | |
| 
 | |
| - CPU caches (L1 and L2)
 | |
| - DMA engines
 | |
| - Core CPU switches
 | |
| - Fabric switch units
 | |
| - PCIe interface controllers
 | |
| - other EDAC/ECC type devices that can be monitored for
 | |
|   errors, etc.
 | |
| 
 | |
| It allows for a 2 level set of hierarchy.
 | |
| 
 | |
| For example, a cache could be composed of L1, L2 and L3 levels of cache.
 | |
| Each CPU core would have its own L1 cache, while sharing L2 and maybe L3
 | |
| caches. On such case, those can be represented via the following sysfs
 | |
| nodes::
 | |
| 
 | |
| 	/sys/devices/system/edac/..
 | |
| 
 | |
| 	pci/		<existing pci directory (if available)>
 | |
| 	mc/		<existing memory device directory>
 | |
| 	cpu/cpu0/..	<L1 and L2 block directory>
 | |
| 		/L1-cache/ce_count
 | |
| 			 /ue_count
 | |
| 		/L2-cache/ce_count
 | |
| 			 /ue_count
 | |
| 	cpu/cpu1/..	<L1 and L2 block directory>
 | |
| 		/L1-cache/ce_count
 | |
| 			 /ue_count
 | |
| 		/L2-cache/ce_count
 | |
| 			 /ue_count
 | |
| 	...
 | |
| 
 | |
| 	the L1 and L2 directories would be "edac_device_block's"
 | |
| 
 | |
| .. kernel-doc:: drivers/edac/edac_device.h
 |