forked from Minki/linux
639781dcab
When trying to debug program, the user often needs to dump large parts of the device's DRAM, which can reach to tens of GBs. Because reading from the device's internal memory through the PCI BAR is extremely slow, the debug can take hours. Instead, we can provide the user to copy data through one of the DMA engines. This will make the operation much faster. Currently, only GAUDI is supported. In GAUDI, we need to find a PCI DMA engine that is IDLE and set the DMA as secured to be able to bypass our MMU as we currently don't map the temporary buffer to the MMU. Example bash one-line to dump entire HBM to file (~2 minutes): for (( i=0x0; i < 0x800000000; i+=0x8000000 )); do \ printf '0x%x\n' $i | sudo tee /sys/kernel/debug/habanalabs/hl0/addr ; \ echo 0x8000000 | sudo tee /sys/kernel/debug/habanalabs/hl0/dma_size ; \ sudo cat /sys/kernel/debug/habanalabs/hl0/data_dma >> hbm.txt ; done Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
231 lines
10 KiB
Plaintext
231 lines
10 KiB
Plaintext
What: /sys/kernel/debug/habanalabs/hl<n>/addr
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Sets the device address to be used for read or write through
|
|
PCI bar, or the device VA of a host mapped memory to be read or
|
|
written directly from the host. The latter option is allowed
|
|
only when the IOMMU is disabled.
|
|
The acceptable value is a string that starts with "0x"
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/clk_gate
|
|
Date: May 2020
|
|
KernelVersion: 5.8
|
|
Contact: ogabbay@kernel.org
|
|
Description: Allow the root user to disable/enable in runtime the clock
|
|
gating mechanism in Gaudi. Due to how Gaudi is built, the
|
|
clock gating needs to be disabled in order to access the
|
|
registers of the TPC and MME engines. This is sometimes needed
|
|
during debug by the user and hence the user needs this option.
|
|
The user can supply a bitmask value, each bit represents
|
|
a different engine to disable/enable its clock gating feature.
|
|
The bitmask is composed of 20 bits:
|
|
|
|
======= ============
|
|
0 - 7 DMA channels
|
|
8 - 11 MME engines
|
|
12 - 19 TPC engines
|
|
======= ============
|
|
|
|
The bit's location of a specific engine can be determined
|
|
using (1 << GAUDI_ENGINE_ID_*). GAUDI_ENGINE_ID_* values
|
|
are defined in uapi habanalabs.h file in enum gaudi_engine_id
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/command_buffers
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Displays a list with information about the currently allocated
|
|
command buffers
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/command_submission
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Displays a list with information about the currently active
|
|
command submissions
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/command_submission_jobs
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Displays a list with detailed information about each JOB (CB) of
|
|
each active command submission
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/data32
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Allows the root user to read or write directly through the
|
|
device's PCI bar. Writing to this file generates a write
|
|
transaction while reading from the file generates a read
|
|
transaction. This custom interface is needed (instead of using
|
|
the generic Linux user-space PCI mapping) because the DDR bar
|
|
is very small compared to the DDR memory and only the driver can
|
|
move the bar before and after the transaction.
|
|
|
|
If the IOMMU is disabled, it also allows the root user to read
|
|
or write from the host a device VA of a host mapped memory
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/data64
|
|
Date: Jan 2020
|
|
KernelVersion: 5.6
|
|
Contact: ogabbay@kernel.org
|
|
Description: Allows the root user to read or write 64 bit data directly
|
|
through the device's PCI bar. Writing to this file generates a
|
|
write transaction while reading from the file generates a read
|
|
transaction. This custom interface is needed (instead of using
|
|
the generic Linux user-space PCI mapping) because the DDR bar
|
|
is very small compared to the DDR memory and only the driver can
|
|
move the bar before and after the transaction.
|
|
|
|
If the IOMMU is disabled, it also allows the root user to read
|
|
or write from the host a device VA of a host mapped memory
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/data_dma
|
|
Date: Apr 2021
|
|
KernelVersion: 5.13
|
|
Contact: ogabbay@kernel.org
|
|
Description: Allows the root user to read from the device's internal
|
|
memory (DRAM/SRAM) through a DMA engine.
|
|
This property is a binary blob that contains the result of the
|
|
DMA transfer.
|
|
This custom interface is needed (instead of using the generic
|
|
Linux user-space PCI mapping) because the amount of internal
|
|
memory is huge (>32GB) and reading it via the PCI bar will take
|
|
a very long time.
|
|
This interface doesn't support concurrency in the same device.
|
|
In GAUDI and GOYA, this action can cause undefined behavior
|
|
in case the it is done while the device is executing user
|
|
workloads.
|
|
Only supported on GAUDI at this stage.
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/device
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Enables the root user to set the device to specific state.
|
|
Valid values are "disable", "enable", "suspend", "resume".
|
|
User can read this property to see the valid values
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/dma_size
|
|
Date: Apr 2021
|
|
KernelVersion: 5.13
|
|
Contact: ogabbay@kernel.org
|
|
Description: Specify the size of the DMA transaction when using DMA to read
|
|
from the device's internal memory. The value can not be larger
|
|
than 128MB. Writing to this value initiates the DMA transfer.
|
|
When the write is finished, the user can read the "data_dma"
|
|
blob
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/dump_security_violations
|
|
Date: Jan 2021
|
|
KernelVersion: 5.12
|
|
Contact: ogabbay@kernel.org
|
|
Description: Dumps all security violations to dmesg. This will also ack
|
|
all security violations meanings those violations will not be
|
|
dumped next time user calls this API
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/engines
|
|
Date: Jul 2019
|
|
KernelVersion: 5.3
|
|
Contact: ogabbay@kernel.org
|
|
Description: Displays the status registers values of the device engines and
|
|
their derived idle status
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_addr
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Sets I2C device address for I2C transaction that is generated
|
|
by the device's CPU
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_bus
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Sets I2C bus address for I2C transaction that is generated by
|
|
the device's CPU
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_data
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Triggers an I2C transaction that is generated by the device's
|
|
CPU. Writing to this file generates a write transaction while
|
|
reading from the file generates a read transcation
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_reg
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Sets I2C register id for I2C transaction that is generated by
|
|
the device's CPU
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/led0
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Sets the state of the first S/W led on the device
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/led1
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Sets the state of the second S/W led on the device
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/led2
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Sets the state of the third S/W led on the device
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/mmu
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Displays the hop values and physical address for a given ASID
|
|
and virtual address. The user should write the ASID and VA into
|
|
the file and then read the file to get the result.
|
|
e.g. to display info about VA 0x1000 for ASID 1 you need to do:
|
|
echo "1 0x1000" > /sys/kernel/debug/habanalabs/hl0/mmu
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/mmu_error
|
|
Date: Mar 2021
|
|
KernelVersion: 5.12
|
|
Contact: fkassabri@habana.ai
|
|
Description: Check and display page fault or access violation mmu errors for
|
|
all MMUs specified in mmu_cap_mask.
|
|
e.g. to display error info for MMU hw cap bit 9, you need to do:
|
|
echo "0x200" > /sys/kernel/debug/habanalabs/hl0/mmu_error
|
|
cat /sys/kernel/debug/habanalabs/hl0/mmu_error
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/set_power_state
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Sets the PCI power state. Valid values are "1" for D0 and "2"
|
|
for D3Hot
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/stop_on_err
|
|
Date: Mar 2020
|
|
KernelVersion: 5.6
|
|
Contact: ogabbay@kernel.org
|
|
Description: Sets the stop-on_error option for the device engines. Value of
|
|
"0" is for disable, otherwise enable.
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/userptr
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Displays a list with information about the currently user
|
|
pointers (user virtual addresses) that are pinned and mapped
|
|
to DMA addresses
|
|
|
|
What: /sys/kernel/debug/habanalabs/hl<n>/vm
|
|
Date: Jan 2019
|
|
KernelVersion: 5.1
|
|
Contact: ogabbay@kernel.org
|
|
Description: Displays a list with information about all the active virtual
|
|
address mappings per ASID and all user mappings of HW blocks
|