linux

Author	SHA1	Message	Date
Ohad Sharabi	5b6b780660	habanalabs: update security map after init CPU Qs when reading CPU_BOOT_DEV_STS0 reg after FW reports SRAM AVAILABLE the value in the register might not yet be updated by FW. to overcome this issue another "up-to-date" read of this register is done at the end of CPU queues init. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-02-08 18:20:08 +02:00
Oded Gabbay	28bcf1fdc4	habanalabs: enable F/W events after init done Only after the initialization of the device is done, the driver is ready to receive events from the F/W. The driver can't handle events before that because of races so it will ignore events. In case of a fatal event, the driver won't know about it and the device will be operational although it shouldn't be. Same logic should be applied after hard-reset. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-02-08 18:20:08 +02:00
Ohad Sharabi	b520ca5d82	habanalabs/gaudi: use HBM_ECC_EN bit for ECC ERR driver should use ECC info from FW only if HBM ECC CAP is set. otherwise, try to fetch the data from MC regs only if security is disabled. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-02-08 18:20:08 +02:00
Ofir Bitton	e52606d2f5	habanalabs: support fetching first available user CQ User must be aware of the available CQs when it needs to use them. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-02-08 18:20:08 +02:00
Ofir Bitton	5dbd7b4de6	habanalabs: improve communication protocol with cpucp Current messaging communictaion protocol with cpucp can get out of sync due to coherency issues. In order to improve the protocol reliability, we modify the protocol to expect a different acknowledgment for every packet sent to cpucp. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-02-08 18:20:08 +02:00
Oded Gabbay	6c1e3f92f9	habanalabs: fix integer handling issue Need to add ull suffix to constant when doing shift of constant into 64-bit variables Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-02-08 18:20:08 +02:00
Oded Gabbay	f1aebf5e3d	habanalabs: update to latest hl_boot_if.h spec from F/W It adds the definition for indication that the F/W handles HBM ECC events. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:51 +02:00
Oded Gabbay	230cd89480	habanalabs/gaudi: unmask HBM interrupts after handling As the driver does with all interrupts, we need to tell F/W to unmask the HBM interrupts after the driver handled them. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:51 +02:00
Oded Gabbay	7838504171	habanalabs: update SyncManager interrupt handling The firmware provides more information about SyncManager events. Adjust the code to the latest firmware interface file. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:51 +02:00
Ohad Sharabi	663a301d75	habanalabs: fix ETR security issue ETR should always be non-secured as it is used by the users to record profiling/trace data. This patch fixes the configuration to match those requirements. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:51 +02:00
Ofir Bitton	2795c88915	habanalabs: staged submission support We introduce a new mechanism named Staged Submission. This mechanism allows the user to send a whole CS in pieces. Each CS will not require completion rather than the last CS. Timeout timer will be triggered upon reception of the first CS in group. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:51 +02:00
Ohad Sharabi	cf30339d3f	habanalabs: modify device_idle interface Currently this API uses single 64 bits mask for engines idle indication. Recently, it was observed that more bits are needed for some ASICs. This patch modifies the use of the idle mask and the idle_extensions mask. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:51 +02:00
Ofir Bitton	0811b39146	habanalabs: add CS completion and timeout properties In order to support staged submission feature, we need to distinguish on which command submission we want to receive timeout and for which we want to receive completion. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:51 +02:00
Ofir Bitton	d00697fbe1	habanalabs: add new mem ioctl op for mapping hw blocks For future ASIC support the driver allows user to map certain regions in the device's configuration space for direct access from userspace. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:51 +02:00
farah kassabri	89473a1fc3	habanalabs: fix MMU debugfs related nodes In mmu debugfs node show un-scrambled physical addresses. before read/write through data nodes, need to unscramble the physical address before using it for pci transaction. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	e1fa724dd1	habanalabs: add user available interrupt to hw_ip In order to support completions that arrive directly to the user, the driver needs to supply the user with the first available msix interrupt available. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
farah kassabri	8d79ce162e	habanalabs: always try to use the hint address Currently hint address is ignored in case va block page size is not power of 2. We need to support th user hint address also in this case, but only if the hint address is aligned to page size. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	d2b980f329	habanalabs: add security violations dump to debugfs In order to improve driver security debuggability, we add security violations dump to debugfs. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	eea4c2557c	habanalabs: ignore F/W BMC errors in case no BMC present In order to support operation mode in which BMC is not active, driver must not take BMC errors into consideration. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	f8bc7f091c	habanalabs/gaudi: print sync manager SEI interrupt info Driver must print sync manager SEI information upon receiving interrupt from FW. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Christophe JAILLET	825b30c4f3	habanalabs: Use 'dma_set_mask_and_coherent()' Axe 'hl_pci_set_dma_mask()' and replace it with an equivalent 'dma_set_mask_and_coherent()' call. This makes the code a bit less verbose. It also removes an erroneous comment, because 'hl_pci_set_dma_mask()' does not try to use a fall-back value. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	423815bf02	habanalabs/gaudi: remove PCI access to SM block Due to HW limitation we must remove all direct access to SM registers, in order to do that we will access SM registers using the HW QMANS. When possible and no user context is present, we can directly access the HW QMANS. Whenever there is an active user, driver will prepare a pending command buffer list which will be sent upon user submissions. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	d3f139c462	habanalabs: add driver support for internal cb scheduling In order to support scnenarios in which driver needs access to HW components but it cannot access them directly, we add support for scheduling command buffers internally. These command buffers will be transmitted upon next user command submission context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	1e3f2536a8	habanalabs: increment ctx ref from within a cs allocation A CS must increment the relevant context reference count. We want to increment the reference inside the CS allocation function as opposed for today where we increment it outside. This is logical since we want to avoid explicitly incrementing the context every time we call the CS allocate function. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	8563e19159	habanalabs: separate common code to dedicated folders We separate some of the common code source files to different folders for a better maintainability and testability. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	edb07cb69c	habanalabs: read device boot errors after cpucp is up Boot cpu can report errors in various boot stages. Current implementaion does not take into consideration errors reported in late stages, hence we will check for errors at the most late stage when fetching cpucp information. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:50 +02:00
Ofir Bitton	6769cea8de	habanalabs: report correct dram size in info ioctl In case MMU is enabled, we must take MMU page size into consideration when reporting dram size to the user. This is because the MMU page size can be a value which is NOT a power-of-2 value. As a result, the total DRAM size (which is always a power-of-2 value) needed to be rounded-down. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Moti Haimovski	b19dc67aa8	habanalabs: support non power-of-2 DRAM phys page sizes DRAM physical page sizes depend of the amount of HBMs available in the device. this number is device-dependent and may also be subject to binning when one or more of the DRAM controllers are found to to be faulty. Such a configuration may lead to partitioning the DRAM to non-power-of-2 pages. To support this feature we also need to add infrastructure of address scarmbling. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Ofir Bitton	a1f8533269	habanalabs: remove access to kernel memory using debugfs Accessing kernel allocated memory through debugfs should not be allowed as it introduces a security vulnerability. We remove the option to read/write kernel memory for all asics. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Ofir Bitton	266cdfa2b7	habanalabs/gaudi: set uninitialized symbol Initialize local variable that is returned by the function, in case it is never assigned. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Alon Mizrahi	9402a33624	habanalabs: return dram virtual address in info ioctl When working with DRAM MMU, we should supply the userspace with the virtual start address of the DRAM instead of the physical one. This is because the physical one has no meaning for the user as he only knows the virtual address range. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Oded Gabbay	3abe1040ba	habanalabs: update to latest hl_boot_if.h Update the latest version of this file that the F/W exports Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Oded Gabbay	1530d46817	habanalabs: add ASIC property of functional HBMs The number of functional HBMs in the same ASIC can be different due to malfunctioning HBM banks. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Ofir Bitton	2e36856008	habanalabs/gaudi: add debug prints for security status In order to have more information while debugging boot issues, we should print the firmware security status at every boot stage. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Omer Shpigelman	f19040ce41	habanalabs: modify memory functions signatures For consistency, modify all memory ioctl functions to get the ioctl arguments structure rather than the arguments themselves. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Omer Shpigelman	3b762f55aa	habanalabs: kernel doc format in memory functions Change all memory functions documentation according to kernel doc format. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Alon Mizrahi	75d9a2a0aa	habanalabs: replace WARN/WARN_ON with dev_crit in driver Often WARN is defined in data-centers as BUG and we would like to avoid hanging the entire server on some internal error of the driver (important as it might be). Therefore, use dev_crit instead. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Moti Haimovski	0eda23d77e	habanalabs: report dram_page_size in hw_ip_info ioctl Instead of having it hard-coded as a define, pass it to the user in runtime. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Ohad Sharabi	e1b85dbaf0	habanalabs/goya: move mmu_prepare to context init Currently mmu_prepare is located at context switch. Since we support a single context, no reason to reconfigure the MMU registers every context switch. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:49 +02:00
Ofir Bitton	f8b0f2ecc5	habanalabs/gaudi: remove duplicated gaudi packets masks As all packets use the same CTL register masks, we remove duplicated masks and use common masks instead. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:48 +02:00
Ofir Bitton	c209e74214	habanalabs: allow user to pass a staged submission seq In order to support the staged submission feature, user must be allowed to use the same CS sequence for all submissions in the same staged submission. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:48 +02:00
Ofir Bitton	ac6fdbfe2e	habanalabs/gaudi: support CS with no completion As part of the staged submission feature, we need Gaudi to support command submissions that will never get a completion. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:48 +02:00
Ofir Bitton	8e39e75a13	habanalabs: Init the VM module for kernel context In order for reserving VA ranges for kernel memory, we need to allow the VM module to be initiated with kernel context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:48 +02:00
Ohad Sharabi	cb6ef0ee6d	habanalabs: refactor MMU locks code remove mmu_cache_lock as it protects a section which is already protected by mmu_lock. in addition, wrap mmu cache invalidate calls in hl_vm_ctx_fini with mmu_lock. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:48 +02:00
Oded Gabbay	4c998836d4	habanalabs: update firmware boot interface Update to latest firmware hl_boot_if.h file. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-27 21:03:48 +02:00
Oded Gabbay	2dc4a6d791	habanalabs: disable FW events on device removal When device is removed, we need to make sure the F/W won't send us any more events because during the remove process we disable the interrupts. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-21 20:30:22 +02:00
Oded Gabbay	f8abaf379b	habanalabs: fix backward compatibility of idle check Need to take the lower 32 bits of the driver's 64-bit idle mask and put it in the legacy 32-bit variable that the userspace reads to know the idle mask. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-21 20:30:22 +02:00
Ofir Bitton	9354f1b421	habanalabs: zero pci counters packet before submit to FW Driver does not zero some pci counters packets before sending to FW. This causes an out of sync PI/CI between driver and FW. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-21 20:30:22 +02:00
Oded Gabbay	9488307a55	habanalabs: prevent soft lockup during unmap When using Deep learning framework such as tensorflow or pytorch, there are tens of thousands of host memory mappings. When the user frees all those mappings at the same time, the process of unmapping and unpinning them can take a long time, which may cause a soft lockup bug. To prevent this, we need to free the core to do other things during the unmapping process. For now, we chose to do it every 32K unmappings (each unmap is a single 4K page). Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-12 15:00:10 +02:00
Oded Gabbay	aa6df6533b	habanalabs: fix reset process in case of failures There are some points in the reset process where if the code fails for some reason, and the system admin tries to initiate the reset process again we will get a kernel panic. This is because there aren't any protections in different fini functions that are called during the reset process. The protections that are added in this patch make sure that if the fini functions are called multiple times, without calling init functions between them, there won't be double release of already released resources. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2021-01-12 14:59:52 +02:00

1 2 3 4 5 ...

535 Commits