In order to minimize hard coded values between F/W and the driver, we
send msi-x indexes dynamically to the F/W.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In case of multiple ECC errors, FW will set the DEVICE_UNUSABLE bit.
On boot-up, the driver will therefore fail inserting the device.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order to use minimum of hard coded values common to LKD and F/W
a dynamic method to work with PLLs is introduced in this patch.
Formerly asic specific PLL numbering is now common for all asics.
To be backward compatible a bit in dev status is defined, if the bit is
not set LKD will keep working with old PLL numbering.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Add driver implementation for reading the current power from the device
CPU F/W.
Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As the F/wW is the first to detect out of sync event, a new event is
added to notify the driver on such event. In which case the driver
performs hard reset.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Current messaging communictaion protocol with cpucp can get out
of sync due to coherency issues. In order to improve the protocol
reliability, we modify the protocol to expect a different
acknowledgment for every packet sent to cpucp.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The firmware provides more information about SyncManager events.
Adjust the code to the latest firmware interface file.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Driver must print sync manager SEI information upon receiving
interrupt from FW.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Boot cpu can report errors in various boot stages.
Current implementaion does not take into consideration errors
reported in late stages, hence we will check for errors at the most
late stage when fetching cpucp information.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In case the clock gating was enabled in preboot we need to disable it
at the H/W initialization stage before touching the MME/TPC registers.
Otherwise, the ASIC can get stuck. If the security is enabled in
the firmware level, the CGM is always disabled and the driver can't
enable it.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
FW hard reset capability indication is now moved to preboot stage.
Driver will check if HW is dirty only after it validated preboot
is up. If HW is dirty, driver will perform a hard reset according
to the FW capability.
In addition, FW defines a new message which driver need to send in
order to initiate a hard reset.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Once firmware security is enabled, driver must fetch pll frequencies
through the firmware message interface instead of reading the registers
directly.
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Once FW security is enabled there is no access to HBM ecc registers,
need to read values from FW using a dedicated interface.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Driver must fetch FW hard reset capability during boot time,
in order to skip the hard reset flow if necessary.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Add support for fetching security indication from FW.
This indication is needed in order to skip unnecessary
initializations done by FW.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Add new structures and messages that the driver use to interact with the
firmware to receive information and events (errors) about GAUDI's NIC.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Add new packet to fetch PLL information from firmware. This will be needed
in the future when the driver won't be able to access the PLL registers
directly
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
For future ASICs, we increase this field by one nibble. This field was not
used by the current ASICs so this change doesn't break anything.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Because the device CPU compiler aligns structures to 8 bytes,
struct cpucp_info has an alignment issue as some parts
in the structure are not aligned to 8 bytes.
It is preferred that we explicitly insert placeholders inside
the structure to avoid confusion
in order to validate this scenario, we printed both pointers:
__u8 cpucp_version[VERSION_MAX_LEN]; (0xffff899c67ed4cbc)
__le64 dram_size; (0xffff899c67ed4d40)
we see difference of 132 bytes although the first array
is only 128 bytes long, Meaning compiler added a 4 byte padding.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
ArmCP mandates that the device CPU is always an ARM processor, which might
be wrong in the future.
Most of this change is an internal renaming of variables, functions and
defines but there are two entries in sysfs which have armcp in their
names. Add identical cpucp entries but don't remove yet the armcp entries.
Those will be deprecated next year. Add the documentation about it in sysfs
documentation.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Update firmware header with new API for getting pcie info
such as tx/rx throughput and replay counter.
These counters are needed by customers for monitor and maintenance
of multiple devices.
Add new opcodes to the INFO ioctl to retrieve these counters.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Update the boot interface file from the latest version from firmware.
Defines for secure boot were added.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>
For internal needs of our CI we need to move all the common code into a
common folder instead of putting them in the root folder of the driver.
Same applies to the common header files under include/
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai>