On some SoCs like MSM8939 with A405 adreno, there is a gfx_tbu clock
needs to be on while doing TLB invalidate. Otherwise, TLBSYNC status
will not be correctly reflected, causing the system to go into a bad
state. Add it as an optional clock, so that platforms that have this
clock can pass it over DT.
While adding the third clock, let's switch to bulk clk API to simplify
the enable/disable calls. clk_bulk_get() cannot used because the
existing two clocks are required while the new one is optional.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20200518141656.26284-1-shawn.guo@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When renesas module is not built, we get compiler warning on xhci driver
with W=1
CC [M] drivers/usb/host/xhci-rcar.o
drivers/usb/host/xhci-pci.h:13:5: warning: no previous prototype for ‘renesas_xhci_check_request_fw’ [-Wmissing-prototypes]
int renesas_xhci_check_request_fw(struct pci_dev *dev,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/usb/host/xhci-pci.h:19:6: warning: no previous prototype for ‘renesas_xhci_pci_exit’ [-Wmissing-prototypes]
void renesas_xhci_pci_exit(struct pci_dev *dev) { };
^~~~~~~~~~~~~~~~~~~~~
We have defined these symbols when CONFIG_USB_XHCI_PCI_RENESAS is not
defined, but missed making then static.
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 8bd5741e31 ("usb: renesas-xhci: Add the renesas xhci driver")
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Link: https://lore.kernel.org/r/20200519093002.1152144-1-vkoul@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For the 32-bit kernel, as described in
6d92bc9d48 ("x86/build: Build compressed x86 kernels as PIE"),
pre-2.26 binutils generates R_386_32 relocations in PIE mode. Since the
startup code does not perform relocation, any reloc entry with R_386_32
will remain as 0 in the executing code.
Commit
974f221c84 ("x86/boot: Move compressed kernel to the end of the
decompression buffer")
added a new symbol _end but did not mark it hidden, which doesn't give
the correct offset on older linkers. This causes the compressed kernel
to be copied beyond the end of the decompression buffer, rather than
flush against it. This region of memory may be reserved or already
allocated for other purposes by the bootloader.
Mark _end as hidden to fix. This changes the relocation from R_386_32 to
R_386_RELATIVE even on the pre-2.26 binutils.
For 64-bit, this is not strictly necessary, as the 64-bit kernel is only
built as PIE if the linker supports -z noreloc-overflow, which implies
binutils-2.27+, but for consistency, mark _end as hidden here too.
The below illustrates the before/after impact of the patch using
binutils-2.25 and gcc-4.6.4 (locally compiled from source) and QEMU.
Disassembly before patch:
48: 8b 86 60 02 00 00 mov 0x260(%esi),%eax
4e: 2d 00 00 00 00 sub $0x0,%eax
4f: R_386_32 _end
Disassembly after patch:
48: 8b 86 60 02 00 00 mov 0x260(%esi),%eax
4e: 2d 00 f0 76 00 sub $0x76f000,%eax
4f: R_386_RELATIVE *ABS*
Dump from extract_kernel before patch:
early console in extract_kernel
input_data: 0x0207c098 <--- this is at output + init_size
input_len: 0x0074fef1
output: 0x01000000
output_len: 0x00fa63d0
kernel_total_size: 0x0107c000
needed_size: 0x0107c000
Dump from extract_kernel after patch:
early console in extract_kernel
input_data: 0x0190d098 <--- this is at output + init_size - _end
input_len: 0x0074fef1
output: 0x01000000
output_len: 0x00fa63d0
kernel_total_size: 0x0107c000
needed_size: 0x0107c000
Fixes: 974f221c84 ("x86/boot: Move compressed kernel to the end of the decompression buffer")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200207214926.3564079-1-nivedita@alum.mit.edu
The patch_cb_size is not updated for Wreg32 in its validate function, so
updated in goya_validate_cb.
Signed-off-by: Rachel Stahl <rstahl@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Instead of writing similar event handling code for each ASIC, move the code
to the common firmware file. This code will be used for GAUDI and all
future ASICs.
In addition, add two new fields to the auto-generated events file: valid
and description. This will save the need to manually write the events
description in the source code and simplify the code.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Enable the GAUDI ASIC code in the pci probe callback of the driver so the
driver will handle GAUDI ASICs.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Add the GAUDI code to initialize the ASIC's profiler. The profile receives
its initialization values from the user, same as in Goya, but the code to
initialize is in the driver because the configuration space of the
device is not directly exposed to the user.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Add the code to initialize the security module of GAUDI. Similar to Goya,
we have two dedicated mechanisms for security: Range Registers and
Protection bits. Those mechanisms protect sensitive memory and
configuration areas inside the device.
In addition, in Gaudi we moved to a 3-level security scheme, where the F/W
runs with the highest security level (Privileged), the driver runs with a
less secured level (Secured) and the user is neither privileged nor
secured. The security module in the driver configures the Secured parts so
the user won't be able to access them. The Privileged parts are configured
by the F/W.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The hwmgr module is responsible for messages sent to GAUDI F/W that are
not common to all habanalabs ASICs.
In GAUDI, we provide the user a simplified mode of controlling the ASIC
clock frequency. Instead of three different clocks, we present a single
clock property that the user can configure via sysfs.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Add the ASIC-dependent code for GAUDI. Supply (almost) all of the function
callbacks that the driver's common code need to initialize, finalize and
submit workloads to the GAUDI ASIC.
It also contains the code to initialize the F/W of the GAUDI ASIC and to
receive events from the F/W.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Add the new defines for GAUDI uapi interface. It includes the queue IDs,
the engine IDs, SRAM reserved space and Sync Manager reserved resources.
There is no new IOCTL or additional operations in existing IOCTLs.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Add the relevant GAUDI ASIC registers header files. These files are
generated automatically from a tool maintained by the VLSI engineers.
There are more files which are not upstreamed because only very few defines
from those files are used in the driver. For those files, we copied the
relevant defines into gaudi_regs.h and gaudi_masks.h, to reduce the size of
this patch.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
For Gaudi the driver gets two new additional properties from the F/W:
1. The card's type - PCI or PMC
2. The card's location in the Gaudi's box (relevant only for PMC).
The card's location is also passed to the user in the HW IP info structure
as it needs this property for establishing communication between Gaudis.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
In Gaudi there is a feature of clock gating certain engines.
Therefore, add this property to the device structure.
In addition, due to a limitation of this feature, the driver needs to
dynamically enable or disable this feature during run-time. Therefore, add
ASIC interface functions to enable/disable this function from the common
code.
Moreover, this feature must be turned off when the user wishes to debug the
ASIC by reading/writing registers and/or memory through the driver's
debugfs. Therefore, add an option to enable/disable clock gating via the
debugfs interface.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
For Gaudi, the driver doesn't change the PM profile automatically due to
device-controlled PM capabilities. Therefore, set the PM profile to auto
only for Goya so the driver's code to automatically change the profile
won't run on Gaudi.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Gaudi requires longer waiting during reset due to closing of network ports.
Add this explanation to the relevant comment in the code and add a
dedicated define for this reset timeout period, instead of multiplying
another define.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Coresight is not supported on simulator, therefore add a boolean for
checking that (currently used by un-upstreamed code).
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Add the following two operations to the CS IOCTL:
Signal:
The signal operation is basically a command submission, that is created by
the driver upon user request. It will be implemented using a dedicated PQE
that will increment a specific SOB. There will be a new flag:
HL_CS_FLAGS_SIGNAL. When the user set this flag in the CS IOCTL structure,
the driver will execute a dedicated code path that will prepare this
special PQE and submit it. The user only needs to provide a queue index on
which to put the signal.
Wait:
The wait operation is also a command submission that is created by the
driver upon user request. It will be implemented using a dedicated PQE that
will contain packets of "ARM a monitor" + FENCE packet. There will be a new
flag: HL_CS_FLAGS_WAIT. When the user set this flag in the CS structure,
the driver will execute a dedicated code path that will prepare this
special PQE and submit it.
The user needs to provide the following parameters:
1. queue ID
2. an array of signal_seq numbers and the number of signals to wait on
(the length of signal_seq_arr).
The IOCTL will return the CS sequence number of the wait it put on the
queue ID.
Currently, the code supports signal_seq_nr==1. But this API definition will
allow us to put a single PQE that waits on multiple signals.
To correctly configure the monitor and fence, the driver will need to
retrieve the specified signal CS object that contains the relevant SOB and
its expected value. In case the signal CS has already been completed, there
is no point of adding a wait operation. In this case, the driver will
return to the user *without* putting anything on the PQ. The return code
should reflect to the user that the signal was completed, as we won't
return a CS sequence number for this wait.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Define a structure representing the h/w sync object (SOB).
a SOB can contain up to 2^15 values. Each signal CS will increment the SOB
by 1, so after some time we will reach the maximum number the SOB can
represent. When that happens, the driver needs to move to a different SOB
for the signal operation.
A SOB can be in 1 of 4 states:
1. Working state with value < 2^15
2. We reached a value of 2^15, but the signal operations weren't completed
yet OR there are pending waits on this signal. For the next submission, the
driver will move to another SOB.
3. ALL the signal operations on the SOB have finished AND there are no more
pending waits on the SOB AND we reached a value of 2^15 (This basically
means the refcnt of the SOB is 0 - see explanation below). When that
happens, the driver can clear the SOB by simply doing WREG32 0 to it and
set the refcnt back to 1.
4. The SOB is cleared and can be used next time by the driver when it needs
to reuse an SOB.
Per SOB, the driver will maintain a single refcnt, that will be initialized
to 1. When a signal or wait operation on this SOB is submitted to the PQ,
the refcnt will be incremented. When a signal or wait operation on this SOB
completes, the refcnt will be decremented. After the submission of the
signal operation that increments the SOB to a value of 2^15, the refcnt is
also decremented.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This feature requires handling h/w resources which are a bit different from
one ASIC to the other. Therefore, we need to define a set of interfaces the
ASIC code provides to the common code to signal, wait, reset sync object
and to reset and init a queue.
As this feature is not supported in Goya, provide an empty implementation
of those functions.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
This is a pre-requisite to upstreaming GAUDI support.
Signal/wait operations are done by the user to perform sync between two
Primary Queues (PQs). The sync is done using the sync manager and it is
usually resolved inside the device, but sometimes it can be resolved in the
host, i.e. the user should be able to wait in the host until a signal has
been completed.
The mechanism to define signal and wait operations is done by the driver
because it needs atomicity and serialization, which is already done in the
driver when submitting work to the different queues.
To implement this feature, the driver "takes" a couple of h/w resources,
and this is reflected by the defines added to the uapi file.
The signal/wait operations are done via the existing CS IOCTL, and they use
the same data structure. There is a difference in the meaning of some of
the parameters, and for that we added unions to make the code more
readable.
Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
PCI drivers should use this define to declare their PCI ID table.
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Make all the CB handles printed in the same way and not some as decimal and
some as hex numbers.
Signed-off-by: Dotan Barak <dbarak@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Update the mapping to the latest one used by the Firmware. No impact on the
driver in this update.
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Set the STMTCSR.COMPEN bit to enable leading-zero trace data
compression functionality for the extended stimulus ports.
Signed-off-by: Adam Aharon <aaharon@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Load CPU device boot loader during driver boot time in order to avoid flash
write for every boot loader update.
To preserve backward-compatibility, skip the device boot load if the device
doesn't request it.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The user must leave space for 2xMSG_PROT in the external CB, so adjust the
define of max size accordingly. The driver, however, can still create a CB
with the maximum size of 2MB. Therefore, we need to add a check
specifically for the user requested size.
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Align the protection bits configuration of all TPC cores to be as of TPC
core 0.
Fixes: a513f9a7ec ("habanalabs: make tpc registers secured")
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Add a new opcode to the INFO IOCTL that retrieves the device time
alongside the host time, to allow a user application that want to measure
device time together with host time (such as a profiler) to synchronize
these times.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>