The Coresight timestamp is enabled for a specific debug session using
the HL_DEBUG_OP_TIMESTAMP opcode of the debug IOCTL.
In order to have a perpetual timestamp that would be comparable between
various debug sessions, this patch moves the timestamp enablement to be
part of the HW initialization.
The HL_DEBUG_OP_TIMESTAMP opcode turns to be deprecated and shouldn't be
used. Old user-space that will call it won't see any change in the
behavior of the debug session.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Add a meaningful name to the general PSOC application status register
which better describes its usage in keeping the HW state.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The PSOC scratch-pad registers are used for communication with the
device CPU. This patch adds new definitions for these registers which
are more descriptive than their general names.
The new set of definitions also gathers and documents the current usage
of the scratch-pad registers by the driver and the device CPU.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
When unmasking IRQs inside the ASIC, the driver passes an array of all the
IRQ to unmask. The ASIC's CPU is working in LE so when running in a BE
host, the driver needs to do the proper endianness swapping when preparing
this array.
In addition, this patch also fixes the endianness of a couple of kernel log
debug messages that print values of packets
Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The PQs of internal H/W queues (QMANs) can be located in different memory
areas for different ASICs. Therefore, when writing PQEs, we need to use
the correct function according to the location of the PQ. e.g. if the PQ
is located in the device's memory (SRAM or DRAM), we need to use
memcpy_toio() so it would work in architectures that have separate
address ranges for IO memory.
This patch makes the code that writes the PQE to be ASIC-specific so we
can handle this properly per ASIC.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Tested-by: Ben Segal <bpsegal20@gmail.com>
Packets that arrive from the user and need to be parsed by the driver are
assumed to be in LE format.
This patch fix all the places where the code handles these packets and use
the correct endianness macros to handle them, as the driver handles the
packets in CPU format (LE or BE depending on the arch).
Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch fix a bug in the host memory polling macro. The bug is that the
memory being polled can be written by the device, which always writes it
in LE. However, if the host is running Linux in BE mode, we need to
convert the value that was written by the device before matching it to the
required value that the caller has given to the macro.
Signed-off-by: Ben Segal <bpsegal20@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
VRHOT event from the F/W indicates the device has reached a temperature of
100 Celsius degrees. In this case, the driver should only print this
information to the kernel log. The device will shutdown itself
automatically when reaching 125 degrees.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
dma_addr_t might be different sizes depending on the configuration,
so we cannot print it as %llx:
drivers/misc/habanalabs/goya/goya.c: In function 'goya_sw_init':
drivers/misc/habanalabs/goya/goya.c:698:21: error: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'dma_addr_t' {aka 'unsigned int'} [-Werror=format=]
Use the special %pad format string. This requires passing the
argument by reference.
Fixes: 2a51558c8c ("habanalabs: remove DMA mask hack for Goya")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The information which is currently provided as a response to the
"HL_INFO_HW_IDLE" IOCTL is merely a general boolean value.
This patch extends it and provides also a bitmask that indicates which
of the device engines are busy.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Command submissions sent to the device are composed of command buffers
which are targeted to different device engines, like DMA and compute
entities. When a command submission gets stuck, knowing in which engine
the stuck is, is crucial for debugging.
This patch adds a debugfs node that exports this information, by
displaying the engines' various registers that assemble their idle/busy
status.
The information retrieval is based on the is_device_idle ASIC function.
The printout in this function, of the first detected busy engine, is
removed because it becomes redundant in the presence of the more
elaborated info of the new debugfs node.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The patch updates the device idle check:
- Add reading the DMA core status register, because it is possible that
a QMAN has finished its work but the DMA itself is still running.
- Remove the MME shadow status check, as the MME ARCH status register
includes the status of all MME shadows.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Allows using the addr/data32 debugfs nodes to access a device VA of a
host mapped memory when the IOMMU is disabled.
Due to the possible large amount of a user host mapped memory, the
driver doesn't maintain a database with the host addresses per device VA.
When the IOMMU is disabled, this missing info is being overcome by
simply using phys_to_virt(). However, this is not useful when the IOMMU
is enabled, and thus the enforced limitation.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch removes the non-standard DMA mask setting for Goya. Now that
the device CPU goes through the MMU, we are not limited to allocating the
CPU accessible memory area in the address space of under 39 bits.
Therefore, we don't need to set the DMA masking twice during
initialization, a practice that is not working on POWER architecture.
The patch sets the DMA mask to 48 bits once during the initialization. The
address of the CPU accessible memory area is configured to the MMU and the
matching VA is given to the device CPU.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch configures the Goya CPU to actually go through the MMU for
translation. The configuration is done after the configuration of the
relevant MMU mappings.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds the necessary MMU mappings for the Goya CPU to access the
device DRAM and the host memory.
The first 256MB of the device DRAM is being mapped. That's where the F/W
is running.
The 2MB area located on the host memory for the purpose of communication
between the driver and the device CPU is also being mapped.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch changes the order of H/W IP initializations. The MMU needs to
be initialized before the device CPU queues, because the CPU will go
through the ASIC MMU in order to reach the host memory (where the queues
are located).
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
unsecured registers can be changed by the user, and hence should be
restored to their default values in context switch
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
On context switch we need to ensure that each user is not be affected by
other user, so we need to clear sync objects and monitors in context
switch instead of in restore_phase_topology function.
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch removes a limitation on the maximum packet size that is read by
the device CPU as that limitation is not needed.
Therefore, the patch also removes an elaborate calculation that is based
on this limitation which is also not needed now. Instead, use a fixed
value for the memory pool size of the packets.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds support to the goya memset function to perform memset to
device memory with size larger then 4GB. In this case, we need to use
multiple LIN_DMA packets because a single packet supports up to 4GB.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch improves the error reporting in case of fatal and non-RAZWI
events such that the event name is printed in addition to the IRQ number.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds a new parameter that is passed to the
add_end_of_cb_packets() asic-specific function.
The parameter is the pointer to the driver's device structure. The
function needs this pointer for future ASICs.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch changes two polling functions to macros, in order to make their
API the same as the standard readl_poll_timeout so we would be able to
define the "condition for exit" when calling these macros.
This will simplify the code as it will eliminate the need to check both
for timeout and for the (cond) in the calling function.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The driver allocates memory for fence object with GFP_ZERO flag, so there
is no need to explicitly write 0 to the allocated object after the
allocation.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Driver-initiated DMA jobs are synchronized jobs, i.e. the driver polls on
fence object until the job is finished. There is no interrupt from the
device. Therefore, no need to add space for 2 * msg_prot packets to the
end of the CB. Only a single msg_prot is needed (to write the fence).
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch changes the order of checks when initializing the device CPU.
We want first to check if we need to load the F/W, and only if we need to,
then we want to check the status of the CPU boot program.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch removes redundant CPU availability checks in:
goya_test_queues() - will be done in goya_test_cpu_queue().
goya_ring_doorbell() - was done earlier in goya_send_cpu_message().
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch fix a potential bug where a user's process has closed
unexpectedly without disabling the debug engines. In that case, the debug
engines might continue running but because the user's MMU mappings are
going away, we will get page fault errors.
This behavior is also opposed to the general rule where nothing runs on
the device after the user process closes.
The patch stops the debug H/W engines upon process termination and thus
makes sure nothing runs on the device after the process goes away.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The CPU accessible DMA memory is general and not used only for PQ.
Accordingly, this patch renames the "free_cpu_pq_dma_mem" label with
"free_cpu_dma_mem".
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The CPU accessible DMA pool is general and not used only for PQ.
Accordingly, this patch rename the "free_cpu_pq_pool" label with
"free_cpu_accessible_dma_pool".
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
After removing the parsing of the command submission
when doing memset of the device memory, goya_validate_dma_pkt_host
is never called by the kernel, so there is no need to check
context id.
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
use_virt_addr member was used for telling whether to treat the
addresses in the CB as virtual during parsing. We disabled it only
when calling the parser from the driver memset device function,
and since this call had been removed, it should always be enabled.
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Routing device accesses to the host memory requires the usage of a base
offset, which is canceled by the iATU just before leaving the device.
The value of the base offset might be distinctive between different ASIC
types.
The manipulation of the addresses is currently used throughout the
driver code, and one should be aware to it whenever providing a host
memory address to the device.
This patch removes this manipulation from the driver common code, and
moves it to the ASIC specific functions that are responsible for
host memory allocation/mapping.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch renames four functions in the ASIC-specific functions section,
so it will be easier to differentiate them from the generic kernel
functions with the same name.
This will help in future code reviews, to make sure we don't use the
kernel functions directly.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
There is no need to parse the command submission when doing memset
of the device memory using the DMA engine because only the driver calls
the memset function and therefore, the CS is trusted and doesn't require
validation and patching.
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch changes the ASIC interface function that changes the DRAM bar
window. The change is to return the old address that the DRAM bar pointed
to instead of an error code.
This simplifies the code that use this function (mainly in debugfs) to
restore the bar to the old setting.
This is also needed for easier support in future ASICs.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch slightly changes the macros of RREG32 and WREG32, which are
used when reading or writing from registers.
Instead of directly calling a function in the common code from these
macros, the new code calls a function from the ASIC functions interface.
This change allows us to share much more code between real ASICs and
simulators, which in turn reduces the maintenance burden and
the chances for forgetting to port code between the ASIC files.
The patch also implements the hl_poll_timeout macro, instead of calling
the generic readl_poll_timeout macro. This is required to allow use of
this macro in the simulator files.
As a result from this change, more functions in goya.c are shared with the
simulator and therefore, should not be defined as static.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch re-factors goya_parse_cb_no_ext_queue() to make it more
readable by inverting the check inside the first if statement so the bulk
of the function won't be inside an if statement.
The patch also fixes a spelling error in the name of the function.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
During hard-reset, contexts are closed as part of the tear-down process.
After a context is closed, the driver cleans up the page tables of that
context in the device's DRAM. This action is both dangerous and
unnecessary.
It is unnecessary, because the device is going through a hard-reset, which
means the device's DRAM contents are no longer valid and the device's MMU
is being reset.
It is dangerous, because if the hard-reset came as a result of a PCI
freeze, this action may cause the entire host machine to hang.
Therefore, prevent all device PTE updates when a hard-reset operation is
pending.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch does some refactoring in goya.c to make code more reusable
between goya code and the goya simulator code (which is not upstreamed).
In addition, the patch removes some dead functions from goya.c which are
not used by the current upstream code
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch adds the ASIC-specific function for GOYA to configure the
coresight components.
Most of the components have an enabled/disabled flag, depending on whether
the user wants to enable the component or disable it.
For some of the components, such as ETR and SPMU, the user can also
request to read values from them. Those values are needed for the user to
parse the trace data.
The ETR configuration is also checked for security purposes, to make sure
the trace data is written to the device's DRAM.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Habanalabs ASICs use the ARM coresight infrastructure to support debug,
tracing and profiling of neural networks topologies.
Because the coresight is configured using register writes and reads, and
some of the registers hold sensitive information (e.g. the address in
the device's DRAM where the trace data is written to), the user must go
through the kernel driver to configure this mechanism.
This patch implements the common code of the IOCTL and calls the
ASIC-specific function for the actual H/W configuration.
The IOCTL supports configuration of seven coresight components:
ETR, ETF, STM, FUNNEL, BMON, SPMU and TIMESTAMP
The user specifies which component he wishes to configure and provides a
pointer to a structure (located in its process space) that contains the
relevant configuration.
The common code copies the relevant data from the user-space to kernel
space and then calls the ASIC-specific function to do the H/W
configuration.
After the configuration is done, which is usually composed
of several IOCTL calls depending on what the user wanted to trace, the
user can start executing the topology. The trace data will be written to
the user's area in the device's DRAM.
After the tracing operation is complete, and user will call the IOCTL
again to disable the tracing operation. The user also need to read
values from registers for some of the components (e.g. the size of the
trace data in the device's DRAM). In that case, the user will provide a
pointer to an "output" structure in user-space, which the IOCTL code will
fill according the to selected component.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Because DMA #0 is now used by the user, remove the limitation of credits
from this channel. Without this patch, this channel is pretty much
unusable due to its very low bandwidth configuration.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
On init or context switch, set TPC clock relaxation counter
register to a golden value.
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch refactors the code that is responsible to set the DMA mask for
the device.
Upon each change of the dma mask, the driver will save the new value that
was set. This is needed in order to make sure we don't try to increase the
mask a second time, in case we failed in the first time. This is
especially relevant for Power machines, as that may cause a change in
configuration of the TVT which will break the device.
Goya will first try to set the device's dma mask to 39 bits, so that the
memory that is allocated on the host machine for communication with the
device's cpu will be in a bus address which is lower then 39 bits. Later,
Goya will try to increase that mask to 48 bits, but only if setting the
mask to 39 bits was successful.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch fixes the implementation of suspend/resume of the device so that
upon resume of the device, the host won't crash due to PCI completion
timeout.
Upon suspend, the device is being reset due to PERST. Therefore, upon
resume, the driver must initialize the PCI controller as if the driver was
loaded. If the controller is not initialized and the device tries to
access the device through the PCI bars, the host will crash with PCI
completion timeout error.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
When parsing the address of an internal command buffer, the driver prints
an error if the buffer's address is not in the range of the device's DRAM
or SRAM memory address space.
Use %px to print the real address that the user gave the driver and not a
hashed value, so the user will get a clue regarding the origin of his
error.
Note that if the print occurs, the pointer that is printed is a
user's virtual address and not some kind of physical address.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>