Sometimes we want to know whether a buffer is busy and wait for it (bo_wait).
However, sometimes it would be more useful to be able to query whether
a buffer is busy and being either read or written, and wait until it's stopped
being either read or written. The point of this is to be able to avoid
unnecessary waiting, e.g. if a GPU has written something to a buffer and is now
reading that buffer, and a CPU wants to map that buffer for read, it needs to
only wait for the last write. If there were no write, there wouldn't be any
waiting needed.
This, or course, requires user space drivers to send read/write flags
with each relocation (like we have read/write domains in radeon, so we can
actually use those for something useful now).
Now how this patch works:
The read/write flags should passed to ttm_validate_buffer. TTM maintains
separate sync objects of the last read and write for each buffer, in addition
to the sync object of the last use of a buffer. ttm_bo_wait then operates
with one the sync objects.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GPU virtual addresses are constant now so this should never be getting hit
anyway and userspace shouldn't break from them being ignored.
This is being done in preference to teaching the code how to deal with BOs
that exist at different virtual addresses within separate VMs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Userspace hasn't passed us a channel_hint for a long long time now, and
there isn't actually a need to do so anymore anyway.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Upcoming patches are going to enable full support for buffers that keep
a constant GPU virtual address whenever they're validated for use by
the GPU.
In order for this to work properly while keeping support for large pages,
we need to know if it's ever going to be possible for a buffer to end
up in GART, and if so, disable large pages for the buffer's VMA.
This is a new restriction that's not present in earlier kernel's, but
should not break userspace as the current code never attempts to validate
buffers into a memtype other than it was created with.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
'mappable' isn't really used at all, nor is it necessary anymore as the
bo code is capable of moving buffers to mappable vram as required.
'no_vm' isn't necessary anymore either, any places that don't want to be
mapped into a GPU address space should allocate the VRAM directly instead.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
NVC0 will be able to share some of nv50's paths this way. This also makes
it the card-specific vram code responsible for deciding if a given set
of tile_flags is valid, rather than duplicating the allowed types in
nv50_vram.c and nouveau_gem.c
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
nv0x-nv4x should be mostly fine, nv50 doesn't work yet.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
nouveau_fence_* functions are not type safe, which could lead to bugs.
Additionally every use of nouveau_fence_unref had to cast struct
nouveau_fence to void **.
Fix it by renaming old functions and creating static inline functions with
new prototypes. We still need old functions, because we pass function
pointers to ttm.
As we are wrapping functions, drop unused "void *arg" parameter where possible.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
No other driver uses this, and userspace should be responsible for handling
locking between them if they share BOs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The bo lock used only to protect the bo sync object members, and since it
is a per bo lock, fencing a buffer list will see a lot of locks and unlocks.
Replace it with a per-device lock that protects the sync object members on
*all* bos. Reading and setting these members will always be very quick, so
the risc of heavy lock contention is microscopic. Note that waiting for
sync objects will always take place outside of this lock.
The bo device fence lock will eventually be replaced with a seqlock /
rcu mechanism so we can determine that a bo is idle under a
rcu / read seqlock.
However this change will allow us to batch fencing and unreserving of
buffers with a minimal amount of locking.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jerome Glisse <j.glisse@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This will be needed for Z compression and to take smarter placement
decisions.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
[airlied - add fix for vmwgfx build]
* 'nouveau/for-airlied' of ../drm-nouveau-next: (93 commits)
drm/ttm: restructure to allow driver to plug in alternate memory manager
drm/ttm: introduce utility function to free an allocated memory node
drm/nouveau: fix thinkos in mem timing table recordlen check
drm/nouveau: parse voltage from perf 0x40 entires
drm/nouveau: don't use the default pll limits in table v2.1 on nv50+ cards
drm/nv50: Fix large 3D performance regression caused by the interchannel sync patches.
drm/nouveau: Synchronize buffer object moves in hardware.
drm/nouveau: Use semaphores to handle inter-channel sync in hardware.
drm/nouveau: Provide a means to have arbitrary work run on fence completion.
drm/nouveau: Minor refactoring/cleanup of the fence code.
drm/nouveau: Add a module option to force card POST.
drm/nv50: prevent (IB_PUT == IB_GET) for occurring unless idle
drm/nv0x-nv4x: Leave the 0x40 bit untouched when changing CRE_LCD.
drm/nv30-nv40: Fix postdivider mask when writing engine/memory PLLs.
drm/nouveau: Fix perf table parsing on BMP v5.25.
drm/nouveau: fix required mode bandwidth calculation for DP
drm/nouveau: fix typo in c2aa91afea5f7e7ae4530fabd37414a79c03328c
drm/nva3: split pm backend out from nv50
drm/nouveau: run perflvl and M table scripts on mem clock change
drm/nouveau: pass perflvl struct to clock_pre()
...
There were lots of places being inconsistent since handle count
looked like a kref but it really wasn't.
Fix this my just making handle count an atomic on the object,
and have it increase the normal object kref.
Now i915/radeon/nouveau drivers can drop the normal reference on
userspace object creation, and have the handle hold it.
This patch fixes a memory leak or corruption on unload, because
the driver had no way of knowing if a handle had been actually
added for this object, and the fbcon object needed to know this
to clean itself up properly.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'nouveau/for-airlied' of /ssd/git/drm-nouveau-next:
drm/nv50: initialize ramht_refs list for faked 0 channel
drm/nouveau: Don't take struct_mutex around the pushbuf IOCTL.
drm/nouveau: Take fence spinlock before reading the last sequence.
We don't need it and it can lead to lock order inversions with respect
to drm_global_mutex, potentially causing dead locks.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
* 'nouveau/for-airlied' of /ssd/git/drm-nouveau-next:
drm/nouveau: drop drm_global_mutex before sleeping in submission path
drm: export drm_global_mutex for drivers to use
drm/nv20: Don't use pushbuf calls on the original nv20.
drm/nouveau: Fix TMDS on some DCB1.5 boards.
drm/nouveau: Fix backlight control on PPC machines with an internal TMDS panel.
drm/nv30: Apply modesetting to the correct slave encoder
drm/nouveau: Use a helper function to match PCI device/subsystem IDs.
drm/nv50: add dcb type 14 to enum to prevent compiler complaint
If we keep hold of the mutex here, the process which currently holds the
buffer object will never be able to release it, causing a deadlock.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The "return" command is buggy on the original nv20, it jumps back to
the caller address as expected, but it doesn't clear the subroutine
active bit making the subsequent pushbuf calls fail with a "stack"
overflow.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
* 'nouveau/for-airlied' of /ssd/git/drm-nouveau-next:
drm/nouveau: fix earlier mistake when fixing merge conflict
drm/nvc0: fix thinko in instmem suspend/resume
drm/nouveau: Workaround missing GPIO tables on an Apple iMac G4 NV18.
drm/nouveau: Add TV-out quirk for an MSI nForce2 IGP.
drm/nv50-nvc0: ramht_size is meant to be in bytes, not entries
drm/nouveau: punt some more log messages to debug level
drm/nouveau: remove warning about unknown tmds table revisions
drm/nouveau: check for error when allocating/mapping dummy page
drm/nouveau: fix race condition when under memory pressure
drm/nv50: fix minor thinko from nvc0 changes
drm/nouveau: Don't try DDC on the dummy I2C channel.
When VRAM is running out it's possible that the client's push buffers get
evicted to main memory. When they're validated back in, the GPU may
be used for the copy back to VRAM, but the existing synchronisation code
only deals with inter-channel sync, not sync between PFIFO and PGRAPH on
the same channel. This leads to PFIFO fetching from command buffers that
haven't quite been copied by PGRAPH yet.
This patch marks push buffers as so, and forces any GPU-assisted buffer
moves to be done on a different channel, which triggers the correct
synchronisation to happen before we submit them.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is consistent with trying to access a filename that not exist
within a directory which is a good analogy here. The main reason for the
change is that it is easy to confuse the error code of EBADF as an
performing an ioctl on an invalid file descriptor (rather than an
unknown object).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* drm-ttm-unmappable:
drm/radeon/kms: enable use of unmappable VRAM V2
drm/ttm: remove io_ field from TTM V6
drm/vmwgfx: add support for new TTM fault callback V5
drm/nouveau/kms: add support for new TTM fault callback V5
drm/radeon/kms: add support for new fault callback V7
drm/ttm: ttm_fault callback to allow driver to handle bo placement V6
drm/ttm: split no_wait argument in 2 GPU or reserve wait
Conflicts:
drivers/gpu/drm/nouveau/nouveau_bo.c
When drivers embed the core gem object into their own structures,
they'll have to do this. Temporarily this results in an ugly
kfree(gem_obj);
in every gem driver.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously we were filling it the same as "placements", but in some
cases there're valid alternatives that we were ignoring completely.
Keeping a back-up memory type helps on several low-mem situations.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There is case where we want to be able to wait only for the
GPU while not waiting for other buffer to be unreserved. This
patch split the no_wait argument all the way down in the whole
ttm path so that upper level can decide on what to wait on or
not.
[airlied: squashed these 4 for bisectability reasons.]
drm/radeon/kms: update to TTM no_wait splitted argument
drm/nouveau: update to TTM no_wait splitted argument
drm/vmwgfx: update to TTM no_wait splitted argument
[vmwgfx patch: Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>]
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Acked-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
* nouveau/for-airlied: (25 commits)
drm/nouveau: use ALIGN instead of open coding it
drm/nouveau: report unknown connector state if lid closed
drm/nouveau: support version 0x20 displayport tables
drm/nouveau: Fix noaccel/nofbaccel option descriptions.
drm/nv50: Implement ctxprog/state generation.
drm/nouveau: use dcb connector types throughout the driver
drm/nv50: enable hpd on any connector we know the gpio line for
drm/nouveau: use dcb connector table for creating drm connectors
drm/nouveau: construct a connector table for cards that lack a real one
drm/nouveau: check for known dcb connector types
drm/nouveau: parse dcb gpio/connector tables after encoders
drm/nouveau: reorganise bios header, add dcb connector type enums
drm/nouveau: merge nvbios and nouveau_bios_info
drm/nouveau: merge parsed_dcb and bios_parsed_dcb into dcb_table
drm/nouveau: rename parsed_dcb_gpio to dcb_gpio_table
drm/nouveau: allow retrieval of vbios image from debugfs
drm/nouveau: fix missing spin_unlock in failure path
drm/nouveau: fix i2ctable bounds checking
drm/nouveau: fix nouveau_i2c_find bounds checking
drm/nouveau: fix pramdac_table range checking
...
Conflicts:
drivers/gpu/drm/nouveau/nouveau_gem.c
Found by sparse.
Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This commit breaks the userspace interface, and requires a new libdrm for
nouveau to operate again.
The multiple GEM_PUSHBUF ioctls that were present in 0.0.15 for
compatibility purposes are now gone, and replaced with the new ioctl which
allows for multiple push buffers to be submitted (necessary for hw index
buffers in the nv50 3d driver) and relocations to be applied on any buffer.
A number of other ioctls (CARD_INIT, GEM_PIN, GEM_UNPIN) that were needed
for userspace modesetting have also been removed.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
PFIFO on G80 and up has a new mode where the main ring buffer is simply a
ring of pointers to indirect buffers containing the actual command/data
packets. In order to be able to implement index buffers in the 3D driver
we need to be able to submit data-only push buffers right after the cmd
packet header, which is only possible using the new command submission
method.
This commit doesn't make it possible to implement index buffers yet, some
userspace interface changes will be required, but it does allow for
testing/debugging of the hardware-side support in the meantime.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Mostly obvious simplifications.
The i915 pread/pwrite ioctls, intel_overlay_put_image and
nouveau_gem_new were incorrectly using the locked versions
without locking: this is also fixed in this patch.
Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
nouveau_gem_ioctl_cpu_prep calls ttm_bo_wait without the bo lock held.
ttm_bo_wait unlocks that lock, and so must be called with it held.
Currently this bug causes libdrm nouveau_bo_busy() to hang the machine.
Signed-off-by: Luca Barbieri <luca at luca-barbieri.com>
Acked-by: Maarten Maathuis <madman2003@gmail.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We need to add the buffer to the list even if we fail, otherwise the
validate_fini() call won't unreserve + unreference the GEM object,
making TTM very unhappy.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Currently there is no check that the pushbuffer request bounds are inside
the TTM BO.
This allows to instruct the kernel to do relocations on user-selected
addresses, since the relocation bounds checking relies on the request
bounds.
This can oops the kernel accidentally and is easily exploitable.
This patch adds bound checking and alignment checking for ->offset and
->nr_dwords.
It also makes some variables unsigned, which should have no effect,
but prevents possible bounds checking problems.
Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>