habanalabs: there is no kernel TDR in future ASICs

In future ASICs, there is no kernel TDR for new workloads that are
submitted directly from user-space to the device.

Therefore, the driver can NEVER know that a workload has timed-out.

So, when the user asks us to wait for interrupt on the workload's
completion, and the wait has timed-out, it doesn't mean the workload
has timed-out. It only means the wait has timed-out, which is NOT an
error from driver's perspective.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This commit is contained in:
Oded Gabbay 2022-01-13 10:05:38 +02:00
parent 4c01e524b2
commit e24a62cb68

View File

@ -2932,11 +2932,14 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
rc = -EIO;
*status = HL_WAIT_CS_STATUS_ABORTED;
} else {
dev_err_ratelimited(hdev->dev, "Waiting for interrupt ID %d timedout\n",
interrupt->interrupt_id);
rc = -ETIMEDOUT;
/* The wait has timed-out. We don't know anything beyond that
* because the workload wasn't submitted through the driver.
* Therefore, from driver's perspective, the workload is still
* executing.
*/
rc = 0;
*status = HL_WAIT_CS_STATUS_BUSY;
}
*status = HL_WAIT_CS_STATUS_BUSY;
}
}
@ -3049,6 +3052,12 @@ wait_again:
interrupt->interrupt_id);
rc = -EINTR;
} else {
/* The wait has timed-out. We don't know anything beyond that
* because the workload wasn't submitted through the driver.
* Therefore, from driver's perspective, the workload is still
* executing.
*/
rc = 0;
*status = HL_WAIT_CS_STATUS_BUSY;
}