There are 3 big benefits to suballocating a single big DMA buffer
for command submission:
1. Avoid hammering CMA. The old way of allocating and freeing a DMA
buffer for each submission was hitting some of the real slow
pathes in CMA, as this allocator was not designed for a concurrent
small buffers load.
2. Less TLB flushes on IOMMUv2. If a new command buffer is mapped into
the GPU address space the MMU TLBs need to be flushed. By having
one big buffer statically mapped to the GPU, a lot of those flushes
can be avoided.
3. No funky workarounds for GC3000. The FE TLB flush on GC3000 isn't
reliable. To work around that we tried to lay out the cmdbufs in
the GPU address space in a way to avoid this issue. This hasn't
always worked if the address space is crowded. A single statically
mapped buffer avoids the erratum completely.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Don't allow IOMMUv2 to peek directly into the cmdbuf, but get the
needed PA through a dedicated function.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Don't call the IOMMU directly, but go through the new cmdbuf abstraction.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This will get more complex with the following changes, so move it
into its own place.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>