This is a HUGE commit, but it's not nearly as bad as it looks - any problems
can be isolated to a particular chipset and engine combination. It was
simply too difficult to port each one at a time, the compat layers are
*already* ridiculous.
Most of the changes here are simply to the glue, the process for each of the
engine modules was to start with a standard skeleton and copy+paste the old
code into the appropriate places, fixing up variable names etc as needed.
v2: Marcin Slusarz <marcin.slusarz@gmail.com>
- fix find/replace bug in license header
v3: Ben Skeggs <bskeggs@redhat.com>
- bump indirect pushbuf size to 8KiB, 4KiB barely enough for userspace and
left no space for kernel's requirements during GEM pushbuf submission.
- fix duplicate assignments noticed by clang
v4: Marcin Slusarz <marcin.slusarz@gmail.com>
- add sparse annotations to nv04_fifo_pause/nv04_fifo_start
- use ioread32_native/iowrite32_native for fifo control registers
v5: Ben Skeggs <bskeggs@redhat.com>
- rebase on v3.6-rc4, modified to keep copy engine fix intact
- nv10/fence: unmap fence bo before destroying
- fixed fermi regression when using nvidia gr fuc
- fixed typo in supported dma_mask checking
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is all very much a policy thing, and hence will not belong in SW
after the rework.
engsw now only handles receiving the event to say "can flip now" and makes
a callback to perform the actual work.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Still the same code, but not an "engine" anymore. The fence code is more of
a policy decision rather than exposing mechanisms, so it's not appropriate
to port it to the new engine subsystem.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Future work will be headed in the way of separating the policy supplied by
the nouveau drm module from the mechanisms provided by the driver core.
There will be a couple of major classes (subdev, engine) of driver modules
that have clearly defined tasks, and the further directory structure change
is to reflect this.
No code changes here whatsoever, aside from fixing up a couple of include
file pathnames.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Now have a somewhat simpler semaphore sync implementation for nv17:nv84,
and a switched to using semaphores as fences on nv84+ and making use of
the hardware's >= acquire operation.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Just a cleanup more or less, and to remove the need for special handling of
software objects.
This removes a heap of documentation on dma/graph object formats. The info
is very out of date with our current understanding, and is far better
documented in rnndb in envytools git.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Wait loop can be interrupted by signal, so if signals are raised
periodically (e.g. SIGALRM) this loop may never finish. Use
emission time as a base for fence timeout.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This adds prime->fd and fd->prime support to nouveau,
it passes the SG object to TTM, and then populates the
GART entries using it.
v2: add stubbed kmap + use new function to fill out pages array
for faulting + add reimport test.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I want to be able to use REF_CNT from other places in the kernel without
pushing a fence object onto the list of emitted fences.
The current code makes an assumption that every time the acked sequence is
bumped that there's at least one fence on the list that'll be signalled.
This will no longer be true in the near future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fence lock needs to be initialized before any call to nouveau_channel_put
because it calls nouveau_channel_idle->nouveau_fence_update which uses
fence lock.
BUG: spinlock bad magic on CPU#0, test/24134
lock: ffff88019f90dba8, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
Pid: 24134, comm: test Not tainted 3.0.0-nv+ #800
Call Trace:
spin_bug+0x9c/0xa3
do_raw_spin_lock+0x29/0x13c
_raw_spin_lock+0x1e/0x22
nouveau_fence_update+0x2d/0xf1
nouveau_channel_idle+0x22/0xa0
nouveau_channel_put_unlocked+0x84/0x1bd
nouveau_channel_put+0x20/0x24
nouveau_channel_alloc+0x4ec/0x585
nouveau_ioctl_fifo_alloc+0x50/0x130
drm_ioctl+0x289/0x361
do_vfs_ioctl+0x4dd/0x52c
sys_ioctl+0x42/0x65
system_call_fastpath+0x16/0x1b
It's easily triggerable from userspace.
Additionally remove double initialization of chan->fence.pending.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The HW will only accept the DMA_FROM_MEMORY class for DMA_SEMAPHORE without
asking the driver to intervene.
It appears that semaphores will work correctly even without DMA_IN_MEMORY,
so lets avoid the large amount of interrupts generated by x-chan sync.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The DDX modifies DMA_SEMAPHORE on nv50 in order to implement sync-to-vblank,
things will go very wrong for cross-channel sync after this.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Commit 21e86c1c8a ("drm/nouveau: remove
cpu_writers lock") turned on lazy waits. Unfortunately
__nouveau_fence_wait was not optimized for this case and on HZ=100
kernel wasted up to 10 ms per call.
Depending on application, it led to 10-30% FPS regression.
Fix it.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
'mappable' isn't really used at all, nor is it necessary anymore as the
bo code is capable of moving buffers to mappable vram as required.
'no_vm' isn't necessary anymore either, any places that don't want to be
mapped into a GPU address space should allocate the VRAM directly instead.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This gives a small, but noticeable performance gain at lower performance
levels, and unchanged at the higher ones.
With this commit, we're now using the same timeslice size as the NVIDIA
binary driver currently does, and dropping an unknown bit that NVIDIA
no longer appear to set.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We may well be making more use of semaphores in the future, having the
entire VM available makes requiring DMA objects for each and every
semaphore block unnecessary.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
These are the same semaphores nvc0 will use, and they potentially allow
us to do much cooler things than our current inter-channel sync impl.
Lets switch to them where possible now for some testing.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Current 3D driver expects this behaviour. While this could be changed,
there's no compelling reason to reserve more than one subchannel for the
DRM. If we ever need to use an object other then M2MF, we can just
re-bind subchannel 0 as required.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
At some point in the future, this bo won't necessarily be backed by
a drm_mm_node, so use the start/size fields of the ttm_mem_reg instead.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Sleeping doesn't pay off for very short delays in comparison with the
minimum granularity of schedule_timeout().
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
nouveau_fence_* functions are not type safe, which could lead to bugs.
Additionally every use of nouveau_fence_unref had to cast struct
nouveau_fence to void **.
Fix it by renaming old functions and creating static inline functions with
new prototypes. We still need old functions, because we pass function
pointers to ttm.
As we are wrapping functions, drop unused "void *arg" parameter where possible.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The structs themselves, as well as the non-sw object creation function are
probably very misnamed now. That's a problem for later :)
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Without it there's a potential race with nouveau_fence_update().
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
It needs a "strong" channel reference because it actually writes to
the channel pushbuf, otherwise the corresponding FIFO context could
get kicked off in the middle of nouveau_fence_sync().
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fences didn't increment the channel reference count, and the fenced
channel could go away at any time. Fixes a potential race in
nouveau_fence_update().
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
It fixes a race between the TTM delayed work queue and the GEM IOCTLs
(fdo bug 29583) uncovered by the BKL removal.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
set_current_state() is called only once before the first iteration.
After return from schedule_timeout() current state is TASK_RUNNING. If
we are going to wait again, set_current_state() must be called.
Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>