When using the DMA-API for instmem, we may obtain a write-combined
mapping. For such cases, add a write barrier in
gk20a_instobj_release_dma() to make sure that all writes have reached
memory at this time.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Commit 69c4938249 ("drm/nouveau/instmem/gk20a: use direct CPU access")
tried to be smart while using the DMA-API by managing the CPU mappings of
buffers allocated with the DMA-API by itself. In doing so, it relied
on dma_to_phys() which is an architecture-private function not
available everywhere. This broke the build on several architectures.
Since there is no reliable and portable way to obtain the physical
address of a DMA-API buffer, stop trying to be smart and just use the
CPU mapping that the DMA-API can provide. This means that buffers will
be CPU-mapped for all their life as opposed to when we need them, but
anyway using the DMA-API here is a fallback for when no IOMMU is
available so we should not expect optimal behavior.
This makes the IOMMU and DMA-API implementations of instmem diverge
enough that we should maybe put them into separate files...
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The LRU list used for recycling CPU mappings was handling concurrency
very poorly. For instance, if an instobj was acquired twice before being
released once, it would end up into the LRU list even though there is
still a client accessing it.
This patch fixes this by properly counting how many clients are
currently using a given instobj.
While at it, we also raise errors when inconsistencies are detected, and
factorize some code.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
gk20a is an ARM only GPU, so we can just do the correct thing on
ARM but fail on other architectures. The other option was to use
SWIOTLB as the define, which means phys_to_page exists, but
this seems clearer.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Use the IOMMU bit specified in platform data instead of hardcoding it to
the bit used by current Tegra GPUs.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The Great Nouveau Refactoring Take II brought us a lot of goodness,
including acquire/release methods that are called before and after an
instobj is modified. These functions can be used as synchronization
points to manage CPU/GPU coherency if we modify an instobj using the
CPU.
This patch replaces the legacy and slow PRAMIN access for gk20a instmem
with CPU mappings and writes. A LRU list is used to unmap unused
mappings after a certain threshold (currently 1MB) of mapped instobjs is
reached. This allows mappings to be reused most of the time.
Accessing instobjs using the CPU requires to maintain the GPU L2 cache,
which we do in the acquire/release functions. This triggers a lot of L2
flushes/invalidates, but most of them are performed on an empty cache
(and thus return immediately), and overall context setup performance
greatly benefits from this (from 250ms to 160ms on Jetson TK1 for a
simple libdrm program).
Making L2 management more explicit should allow us to grab some more
performance in the future.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The copyright header in nvkm/engine/device/platform.c has been replaced
with the NVIDIA one from drm/nouveau_platform.c, as most of the actual
code is now theirs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Pretty much every subdev/engine is going to need access to nvkm_device
shortly to touch registers and/or output messages.
The odd placement of the includes is necessary to work around some
inter-dependencies that currently exist. This will be fixed later.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
If a memory allocation fails when using the DMA allocator,
gk20a_instobj_dtor_dma() will be called on the failed instmem object.
At this time, node->handle might not be NULL despite the call to
dma_alloc_attrs() having failed. node->cpuaddr is the right member to
check for such a failure, so use it instead.
Reported-by: Vince Hsu <vinceh@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Let GK20A's instmem take advantage of the IOMMU if it is present. Having
an IOMMU means that instmem is no longer allocated using the DMA API,
but instead obtained through page_alloc and made contiguous to the GPU
by IOMMU mappings.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
instmem for GK20A is allocated using dma_alloc_coherent(), which
provides us with a coherent CPU mapping that we never use because
instmem objects are accessed through PRAMIN. Switch to
dma_alloc_attrs() which gives us the option to dismiss that CPU mapping
and free up some CPU virtual space.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GK20A does not have dedicated RAM, thus having a RAM device for it does
not make sense. Move the contiguous physical memory allocation to
instmem.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>