Fences attached to deferred client work items now originate from channels
belonging to the client, meaning we can be certain they've been signalled
before we destroy a client.
This closes a race that could happen if the dma_fence_wait_timeout() call
didn't succeed. When the fence was later signalled, a use-after-free was
possible.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
As VMAs are per-client, unlike buffers, this allows us to avoid referencing
foreign fences (those that belong to another client/driver) from the client
deferred work handler, and prevent some not-fun race conditions that can be
triggered when a fence stalls.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We previously only did this for push buffers, but an upcoming patch will
need to attach fences to all VMAs to resolve another issue.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
These were missed the first time around due to the driver version I traced
using the older registers still.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There are differences on GM200 and newer too, but we can't fix them there
as they come from firmware packages.
A request has been made to NVIDIA to release updated firmware.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There's a number of places that require this data, so let's separate out
the calculations to ensure they remain consistent.
This is incorrect for GM200 and newer, but will produce the same results
as we did before.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There's also a couple of hardcoded tables for a couple of very specific
configurations that NVGPU's algorithm didn't work for.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The algorithm for GM200 and newer matches RM for all the boards I have, but
I don't have enough data to try and figure something out for earlier boards,
so these will still write zeroes to the table as we did before.
The code in NVGPU isn't helpful here, it appears to handle specific cases.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
I don't think this is done after Fermi, NVGPU used to do it but removed
the code, and I've not seen RM traces touching it either.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
I haven't yet been able to find a fully programatic way of calculating the
same mapping as NVIDIA for GF100-GF119, so the algorithm partially depends
on data tables for specific configurations.
I couldn't find traces for every possibility, so the algorithm will switch
to a mapping similar to what GK104-GM10x use if it encounters one. We did
the wrong thing before anyway, so shouldn't matter too much.
The algorithm used in the GK104 implementation was ported from NVGPU.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Newer HW doesn't appear to send this event, which will cause long delays
in runlist updates if they don't complete immediately.
RM doesn't use these events anywhere, and an NVGPU commit message notes
that polling is the preferred method even on HW that supports the event.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We didn't used to be aware that runlist/engine IDs weren't the same thing,
or that there was such variability in configuration between GPUs.
By exposing this information to a client, and giving it explicit control
of which runlist it's allocating a channel on, we're able to make better
choices.
The immediate effect of this is that on GPUs where CE0 is the "GRCE", we
will now be allocating a copy engine running asynchronously to GR for BO
migrations - as intended.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We have a need to fetch data from GPU-specific sub-devices that is not
tied to any particular engine object.
This commit provides the framework to support such queries.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will be required to support Volta, but also allows us to remove code
that's duplicated for each channel type already.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Introduces a new method of defining channels available from the display,
common to all channel types, allowing for more flexibility in available
channel types/counts, and reducing the amount of boiler-plate required.
This will be required to support Volta.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Engines are initialised on an as-needed basis, so this results in the
same behaviour, whilst allowing us to simplify things a bit.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We should be reading registers to determine which subunits are really
present on a given board, and this needs to be done after DEVINIT.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Likely a rebase bug. Should have no impact in default configuration due
to using per-instance setting by default.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
In preparation to enabling -Wvla, remove VLA. In this particular
case directly use macro NVKM_MSGQUEUE_CMDLINE_SIZE instead of local
variable cmdline_size. Also, remove cmdline_size as it is not
actually useful anymore.
The use of stack Variable Length Arrays needs to be avoided, as they
can be a vector for stack exhaustion, which can be both a runtime bug
or a security flaw. Also, in general, as code evolves it is easy to
lose track of how big a VLA can get. Thus, we can end up having runtime
failures that are hard to debug.
Also, fixed as part of the directive to remove all VLAs from
the kernel: https://lkml.org/lkml/2018/3/7/621
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>