drm-misc-next for v5.15:
UAPI Changes:
Cross-subsystem Changes:
- udmabuf: Add support for mapping hugepages
- Add dma-buf stats to sysfs.
- Assorted fixes to fbdev/omap2.
- dma-buf: Document DMA_BUF_IOCTL_SYNC
- Improve dma-buf non-dynamic exporter expectations better.
- Add module parameters for dma-buf size and list limit.
- Add HDMI codec support to vc4, to replace vc4's own codec.
- Document dma-buf implicit fencing rules.
- dma_resv_test_signaled test_all handling.
Core Changes:
- Extract i915's eDP backlight code into DRM helpers.
- Assorted docbook updates.
- Rework drm_dp_aux documentation.
- Add support for the DP aux bus.
- Shrink dma-fence-chain slightly.
- Add alloc/free helpers for dma-fence-chain.
- Assorted fixes to TTM., drm/of, bridge
- drm_gem_plane_helper_prepare/cleanup_fb is now the default for gem drivers.
- Small fix for scheduler completion.
- Remove use of drm_device.irq_enabled.
- Print the driver name to dmesg when registering framebuffer.
- Export drm/gem's shadow plane handling, and use it in vkms.
- Assorted small fixes.
Driver Changes:
- Add eDP backlight to nouveau.
- Assorted fixes and cleanups to nouveau, panfrost, vmwgfx, anx7625,
amdgpu, gma500, radeon, mgag200, vgem, vc4, vkms, omapdrm.
- Add support for Samsung DB7430, Samsung ATNA33XC20, EDT ETMV570G2DHU,
EDT ETM0350G0DH6, Innolux EJ030NA panels.
- Fix some simple pannels missing bus_format and connector types.
- Add mks-guest-stats instrumentation support to vmwgfx.
- Merge i915-ttm topic branch.
- Make s6e63m0 panel use Mipi-DBI helpers.
- Add detect() supoprt for AST.
- Use interrupts for hotplug on vc4.
- vmwgfx is now moved to drm-misc-next, as sroland is no longer a maintainer for now.
- vmwgfx now uses copies of vmware's internal device headers.
- Slowly convert ti-sn65dsi83 over to atomic.
- Rework amdgpu dma-resv handling.
- Fix virtio fencing for planes.
- Ensure amdgpu can always evict to SYSTEM.
- Many drivers fixed for implicit fencing rules.
- Set default prepare/cleanup fb for tiny, vram and simple helpers too.
- Rework panfrost gpu reset and related serialization.
- Update VKMS todo list.
- Make bochs a tiny gpu driver, and use vram helper.
- Use linux irq interfaces instead of drm_irq in some drivers.
- Add support for Raspberry Pi Pico to GUD.
Signed-off-by: Dave Airlie <airlied@redhat.com>
# gpg: Signature made Fri 16 Jul 2021 21:06:04 AEST
# gpg: using RSA key B97BD6A80CAC4981091AE547FE558C72A67013C3
# gpg: Good signature from "Maarten Lankhorst <maarten.lankhorst@linux.intel.com>" [expired]
# gpg: aka "Maarten Lankhorst <maarten@debian.org>" [expired]
# gpg: aka "Maarten Lankhorst <maarten.lankhorst@canonical.com>" [expired]
# gpg: Note: This key has expired!
# Primary key fingerprint: B97B D6A8 0CAC 4981 091A E547 FE55 8C72 A670 13C3
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/444811c3-cbec-e9d5-9a6b-9632eda7962a@linux.intel.com
The V3D engine has several hardware performance counters that can of
interest for userspace performance analysis tools.
This exposes new ioctls to create and destroy performance monitor
objects, as well as to query the counter values.
Each created performance monitor object has an ID that can be attached
to CL/CSD submissions, so the driver enables the requested counters when
the job is submitted, and updates the performance monitor values when
the job is done.
It is up to the user to ensure all the jobs have been finished before
getting the performance monitor values. It is also up to the user to
properly synchronize BCL jobs when submitting jobs with different
performance monitors attached.
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Emma Anholt <emma@anholt.net>
To: dri-devel@lists.freedesktop.org
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Melissa Wen <melissa.srw@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210608111541.461991-1-jasuarez@igalia.com
drm_file->master pointers should be protected by
drm_device.master_mutex or drm_file.master_lookup_lock when being
dereferenced.
However, in drm_lease.c, there are multiple instances where
drm_file->master is accessed and dereferenced while neither lock is
held. This makes drm_lease.c vulnerable to use-after-free bugs.
We address this issue in 2 ways:
1. Add a new drm_file_get_master() function that calls drm_master_get
on drm_file->master while holding on to
drm_file.master_lookup_lock. Since drm_master_get increments the
reference count of master, this prevents master from being freed until
we unreference it with drm_master_put.
2. In each case where drm_file->master is directly accessed and
eventually dereferenced in drm_lease.c, we wrap the access in a call
to the new drm_file_get_master function, then unreference the master
pointer once we are done using it.
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210712043508.11584-6-desmondcheongzx@gmail.com
Currently, drm_file.master pointers should be protected by
drm_device.master_mutex when being dereferenced. This is because
drm_file.master is not invariant for the lifetime of drm_file. If
drm_file is not the creator of master, then drm_file.is_master is
false, and a call to drm_setmaster_ioctl will invoke
drm_new_set_master, which then allocates a new master for drm_file and
puts the old master.
Thus, without holding drm_device.master_mutex, the old value of
drm_file.master could be freed while it is being used by another
concurrent process.
However, it is not always possible to lock drm_device.master_mutex to
dereference drm_file.master. Through the fbdev emulation code, this
might occur in a deep nest of other locks. But drm_device.master_mutex
is also the outermost lock in the nesting hierarchy, so this leads to
potential deadlocks.
To address this, we introduce a new spin lock at the bottom of the
lock hierarchy that only serializes drm_file.master. With this change,
the value of drm_file.master changes only when both
drm_device.master_mutex and drm_file.master_lookup_lock are
held. Hence, any process holding either of those locks can ensure that
the value of drm_file.master will not change concurrently.
Since no lock depends on the new drm_file.master_lookup_lock, when
drm_file.master is dereferenced, but drm_device.master_mutex cannot be
held, we can safely protect the master pointer with
drm_file.master_lookup_lock.
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210712043508.11584-5-desmondcheongzx@gmail.com
While checking the master status of the DRM file in
drm_is_current_master(), the device's master mutex should be
held. Without the mutex, the pointer fpriv->master may be freed
concurrently by another process calling drm_setmaster_ioctl(). This
could lead to use-after-free errors when the pointer is subsequently
dereferenced in drm_lease_owner().
The callers of drm_is_current_master() from drm_auth.c hold the
device's master mutex, but external callers do not. Hence, we implement
drm_is_current_master_locked() to be used within drm_auth.c, and
modify drm_is_current_master() to grab the device's master mutex
before checking the master status.
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210712043508.11584-4-desmondcheongzx@gmail.com
Inside drm_clients_info, the rcu_read_lock is held to lock
pid_task()->comm. However, within this protected section, a call to
drm_is_current_master is made, which involves a mutex lock in a future
patch. However, this is illegal because the mutex lock might block
while in the RCU read-side critical section.
Since drm_is_current_master isn't protected by rcu_read_lock, we avoid
this by moving it out of the RCU critical section.
The following report came from intel-gfx ci's
igt@debugfs_test@read_all_entries testcase:
=============================
[ BUG: Invalid wait context ]
5.13.0-CI-Patchwork_20515+ #1 Tainted: G W
-----------------------------
debugfs_test/1101 is trying to lock:
ffff888132d901a8 (&dev->master_mutex){+.+.}-{3:3}, at:
drm_is_current_master+0x1e/0x50
other info that might help us debug this:
context-{4:4}
3 locks held by debugfs_test/1101:
#0: ffff88810fdffc90 (&p->lock){+.+.}-{3:3}, at:
seq_read_iter+0x53/0x3b0
#1: ffff888132d90240 (&dev->filelist_mutex){+.+.}-{3:3}, at:
drm_clients_info+0x63/0x2a0
#2: ffffffff82734220 (rcu_read_lock){....}-{1:2}, at:
drm_clients_info+0x1b1/0x2a0
stack backtrace:
CPU: 8 PID: 1101 Comm: debugfs_test Tainted: G W
5.13.0-CI-Patchwork_20515+ #1
Hardware name: Intel Corporation CometLake Client Platform/CometLake S
UDIMM (ERB/CRB), BIOS CMLSFWR1.R00.1263.D00.1906260926 06/26/2019
Call Trace:
dump_stack+0x7f/0xad
__lock_acquire.cold.78+0x2af/0x2ca
lock_acquire+0xd3/0x300
? drm_is_current_master+0x1e/0x50
? __mutex_lock+0x76/0x970
? lockdep_hardirqs_on+0xbf/0x130
__mutex_lock+0xab/0x970
? drm_is_current_master+0x1e/0x50
? drm_is_current_master+0x1e/0x50
? drm_is_current_master+0x1e/0x50
drm_is_current_master+0x1e/0x50
drm_clients_info+0x107/0x2a0
seq_read_iter+0x178/0x3b0
seq_read+0x104/0x150
full_proxy_read+0x4e/0x80
vfs_read+0xa5/0x1b0
ksys_read+0x5a/0xd0
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210712043508.11584-3-desmondcheongzx@gmail.com
In preparation for a future patch to take a lock on
drm_device.master_mutex inside drm_is_current_master(), we first move
the call to drm_is_current_master() in drm_mode_getconnector out from the
section locked by &dev->mode_config.mutex. This avoids creating a
circular lock dependency.
Failing to avoid this lock dependency produces the following lockdep
splat:
======================================================
WARNING: possible circular locking dependency detected
5.13.0-rc7-CI-CI_DRM_10254+ #1 Not tainted
------------------------------------------------------
kms_frontbuffer/1087 is trying to acquire lock:
ffff88810dcd01a8 (&dev->master_mutex){+.+.}-{3:3}, at: drm_is_current_master+0x1b/0x40
but task is already holding lock:
ffff88810dcd0488 (&dev->mode_config.mutex){+.+.}-{3:3}, at: drm_mode_getconnector+0x1c6/0x4a0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&dev->mode_config.mutex){+.+.}-{3:3}:
__mutex_lock+0xab/0x970
drm_client_modeset_probe+0x22e/0xca0
__drm_fb_helper_initial_config_and_unlock+0x42/0x540
intel_fbdev_initial_config+0xf/0x20 [i915]
async_run_entry_fn+0x28/0x130
process_one_work+0x26d/0x5c0
worker_thread+0x37/0x380
kthread+0x144/0x170
ret_from_fork+0x1f/0x30
-> #1 (&client->modeset_mutex){+.+.}-{3:3}:
__mutex_lock+0xab/0x970
drm_client_modeset_commit_locked+0x1c/0x180
drm_client_modeset_commit+0x1c/0x40
__drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
drm_fb_helper_set_par+0x34/0x40
intel_fbdev_set_par+0x11/0x40 [i915]
fbcon_init+0x270/0x4f0
visual_init+0xc6/0x130
do_bind_con_driver+0x1e5/0x2d0
do_take_over_console+0x10e/0x180
do_fbcon_takeover+0x53/0xb0
register_framebuffer+0x22d/0x310
__drm_fb_helper_initial_config_and_unlock+0x36c/0x540
intel_fbdev_initial_config+0xf/0x20 [i915]
async_run_entry_fn+0x28/0x130
process_one_work+0x26d/0x5c0
worker_thread+0x37/0x380
kthread+0x144/0x170
ret_from_fork+0x1f/0x30
-> #0 (&dev->master_mutex){+.+.}-{3:3}:
__lock_acquire+0x151e/0x2590
lock_acquire+0xd1/0x3d0
__mutex_lock+0xab/0x970
drm_is_current_master+0x1b/0x40
drm_mode_getconnector+0x37e/0x4a0
drm_ioctl_kernel+0xa8/0xf0
drm_ioctl+0x1e8/0x390
__x64_sys_ioctl+0x6a/0xa0
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
other info that might help us debug this:
Chain exists of: &dev->master_mutex --> &client->modeset_mutex --> &dev->mode_config.mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->mode_config.mutex);
lock(&client->modeset_mutex);
lock(&dev->mode_config.mutex);
lock(&dev->master_mutex);
*** DEADLOCK ***
1 lock held by kms_frontbuffer/1087:
#0: ffff88810dcd0488 (&dev->mode_config.mutex){+.+.}-{3:3}, at: drm_mode_getconnector+0x1c6/0x4a0
stack backtrace:
CPU: 7 PID: 1087 Comm: kms_frontbuffer Not tainted 5.13.0-rc7-CI-CI_DRM_10254+ #1
Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3234.A01.1906141750 06/14/2019
Call Trace:
dump_stack+0x7f/0xad
check_noncircular+0x12e/0x150
__lock_acquire+0x151e/0x2590
lock_acquire+0xd1/0x3d0
__mutex_lock+0xab/0x970
drm_is_current_master+0x1b/0x40
drm_mode_getconnector+0x37e/0x4a0
drm_ioctl_kernel+0xa8/0xf0
drm_ioctl+0x1e8/0x390
__x64_sys_ioctl+0x6a/0xa0
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210712043508.11584-2-desmondcheongzx@gmail.com
drm: Return -ENOTTY for non-drm ioctls
Return -ENOTTY from drm_ioctl() when userspace passes in a cmd number
which doesn't relate to the drm subsystem.
Glibc uses the TCGETS ioctl to implement isatty(), and without this
change isatty() returns it incorrectly returns true for drm devices.
To test run this command:
$ if [ -t 0 ]; then echo is a tty; fi < /dev/dri/card0
which shows "is a tty" without this patch.
This may also modify memory which the userspace application is not
expecting.
Signed-off-by: Charles Baylis <cb-kernel@fishzet.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/YPG3IBlzaMhfPqCr@stando.fishzet.co.uk
[Bug][AST2500]
V1:
When AST2500 acts as stand-alone VGA so that DRAM and DVO initialization
have to be achieved by VGA driver with P2A (PCI to AHB) enabling.
However, HW suggests disable Fast reset mode after DRAM initializaton,
because fast reset mode is mainly designed for ARM ICE debugger.
Once Fast reset is checked as enabling, WDT (Watch Dog Timer) should be
first enabled to avoid system deadlock before disable fast reset mode.
V2:
Use to_pci_dev() to get revision of PCI configuration.
V3:
If SCU00 is not unlocked, just enter its password again.
It is unnecessary to clear AHB lock condition and restore WDT default
setting again, before Fast-reset clearing.
V4:
repatch after "error : could not build fake ancestor" resolved.
V5:
Since CVE_2019_6260 item3, Most of AST2500 have disabled P2A(PCIe to AMBA).
However, for backward compatibility, some patches about P2A, such as items
of v5.2 and v5.3, are considered to be upstreamed with comments.
1. Add define macro to improve source readability.
ast_drv.h, ast_main.c, ast_post.c
2. Add comment about "Fast restet" is enabled for ARM-ICE debugger
ast_post.c
3. Add comment about Reset USB port to patch USB unknown device issue
ast_post.c
Signed-off-by: KuoHsiang Chou <kuohsiang_chou@aspeedtech.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210709080900.4056-1-kuohsiang_chou@aspeedtech.com
This reverts commit 9e31c1fe45. Ever
since that commit, we've been having issues where a hang in one client
can propagate to another. In particular, a hang in an app can propagate
to the X server which causes the whole desktop to lock up.
Error propagation along fences sound like a good idea, but as your bug
shows, surprising consequences, since propagating errors across security
boundaries is not a good thing.
What we do have is track the hangs on the ctx, and report information to
userspace using RESET_STATS. That's how arb_robustness works. Also, if my
understanding is still correct, the EIO from execbuf is when your context
is banned (because not recoverable or too many hangs). And in all these
cases it's up to userspace to figure out what is all impacted and should
be reported to the application, that's not on the kernel to guess and
automatically propagate.
What's more, we're also building more features on top of ctx error
reporting with RESET_STATS ioctl: Encrypted buffers use the same, and the
userspace fence wait also relies on that mechanism. So it is the path
going forward for reporting gpu hangs and resets to userspace.
So all together that's why I think we should just bury this idea again as
not quite the direction we want to go to, hence why I think the revert is
the right option here.
For backporters: Please note that you _must_ have a backport of
https://lore.kernel.org/dri-devel/20210602164149.391653-2-jason@jlekstrand.net/
for otherwise backporting just this patch opens up a security bug.
v2: Augment commit message. Also restore Jason's sob that I
accidentally lost.
v3: Add a note for backporters
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Marcin Slusarz <marcin.slusarz@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
Fixes: 9e31c1fe45 ("drm/i915: Propagate errors on awaiting already signaled fences")
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-3-jason@jlekstrand.net
(cherry picked from commit 93a2711cdd)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This reverts 686c7c35ab ("drm/i915/gem: Asynchronous cmdparser"). The
justification for this commit in the git history was a vague comment
about getting it out from under the struct_mutex. While this may
improve perf for some workloads on Gen7 platforms where we rely on the
command parser for features such as indirect rendering, no numbers were
provided to prove such an improvement. It claims to closed two
gitlab/bugzilla issues but with no explanation whatsoever as to why or
what bug it's fixing.
Meanwhile, by moving command parsing off to an async callback, it leaves
us with a problem of what to do on error. When things were synchronous,
EXECBUFFER2 would fail with an error code if parsing failed. When
moving it to async, we needed another way to handle that error and the
solution employed was to set an error on the dma_fence and then trust
that said error gets propagated to the client eventually. Moving back
to synchronous will help us untangle the fence error propagation mess.
This also reverts most of 0edbb9ba1b ("drm/i915: Move cmd parser
pinning to execbuffer") which is a refactor of some of our allocation
paths for asynchronous parsing. Now that everything is synchronous, we
don't need it.
v2 (Daniel Vetter):
- Add stabel Cc and Fixes tag
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: <stable@vger.kernel.org> # v5.6+
Fixes: 9e31c1fe45 ("drm/i915: Propagate errors on awaiting already signaled fences")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-2-jason@jlekstrand.net
(cherry picked from commit 93b7133041)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
As dw-mipi-dsi supported all possible ways to find the DSI
devices. It can take multiple iterations for ltdc to find
all components attached to the DSI bridge.
The current ltdc driver failed to find the endpoint as
it returned -EINVAL for the first iteration itself. This leads
to following error:
[ 3.099289] [drm:ltdc_load] *ERROR* init encoder endpoint 0
So, check the return value and cleanup the encoder only if it's
not -EPROBE_DEFER. This make all components in the attached DSI
bridge found properly.
Signed-off-by: Jagan Teki <jagan@amarulasolutions.com>
Tested-by: Yannick Fertre <yannick.fertre@foss.st.com>
Acked-by: Philippe Cornu <philippe.cornu@foss.st.com>
Signed-off-by: Philippe Cornu <philippe.cornu@foss.st.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210704135914.268308-1-jagan@amarulasolutions.com
This adds a driver for panels based on the WideChips WS2401 display
controller. This display controller is used in the Samsung LMS380KF01
display found in the Samsung GT-I8160 (Codina) mobile phone and
possibly others.
As is common with Samsung displays manufacturer commands are necessary
to configure the display to a working state.
The display optionally supports internal backlight control, but can
also use an external backlight.
This driver re-uses the DBI infrastructure to communicate with the
display.
Cc: phone-devel@vger.kernel.org
Cc: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210714225002.1065107-2-linus.walleij@linaro.org
This reverts commit 9e31c1fe45. Ever
since that commit, we've been having issues where a hang in one client
can propagate to another. In particular, a hang in an app can propagate
to the X server which causes the whole desktop to lock up.
Error propagation along fences sound like a good idea, but as your bug
shows, surprising consequences, since propagating errors across security
boundaries is not a good thing.
What we do have is track the hangs on the ctx, and report information to
userspace using RESET_STATS. That's how arb_robustness works. Also, if my
understanding is still correct, the EIO from execbuf is when your context
is banned (because not recoverable or too many hangs). And in all these
cases it's up to userspace to figure out what is all impacted and should
be reported to the application, that's not on the kernel to guess and
automatically propagate.
What's more, we're also building more features on top of ctx error
reporting with RESET_STATS ioctl: Encrypted buffers use the same, and the
userspace fence wait also relies on that mechanism. So it is the path
going forward for reporting gpu hangs and resets to userspace.
So all together that's why I think we should just bury this idea again as
not quite the direction we want to go to, hence why I think the revert is
the right option here.
For backporters: Please note that you _must_ have a backport of
https://lore.kernel.org/dri-devel/20210602164149.391653-2-jason@jlekstrand.net/
for otherwise backporting just this patch opens up a security bug.
v2: Augment commit message. Also restore Jason's sob that I
accidentally lost.
v3: Add a note for backporters
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Marcin Slusarz <marcin.slusarz@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
Fixes: 9e31c1fe45 ("drm/i915: Propagate errors on awaiting already signaled fences")
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-3-jason@jlekstrand.net
This reverts 686c7c35ab ("drm/i915/gem: Asynchronous cmdparser"). The
justification for this commit in the git history was a vague comment
about getting it out from under the struct_mutex. While this may
improve perf for some workloads on Gen7 platforms where we rely on the
command parser for features such as indirect rendering, no numbers were
provided to prove such an improvement. It claims to closed two
gitlab/bugzilla issues but with no explanation whatsoever as to why or
what bug it's fixing.
Meanwhile, by moving command parsing off to an async callback, it leaves
us with a problem of what to do on error. When things were synchronous,
EXECBUFFER2 would fail with an error code if parsing failed. When
moving it to async, we needed another way to handle that error and the
solution employed was to set an error on the dma_fence and then trust
that said error gets propagated to the client eventually. Moving back
to synchronous will help us untangle the fence error propagation mess.
This also reverts most of 0edbb9ba1b ("drm/i915: Move cmd parser
pinning to execbuffer") which is a refactor of some of our allocation
paths for asynchronous parsing. Now that everything is synchronous, we
don't need it.
v2 (Daniel Vetter):
- Add stabel Cc and Fixes tag
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: <stable@vger.kernel.org> # v5.6+
Fixes: 9e31c1fe45 ("drm/i915: Propagate errors on awaiting already signaled fences")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210714193419.1459723-2-jason@jlekstrand.net
Commit 72a7cf0aec ("drm/amd/display: Keep linebuffer pixel depth at
30bpp for DCE-11.0.") doesn't seems to have fixed 10bit 4K rendering over
DisplayPort for CIK GPUs. On my machine with a HAWAII GPU I get a broken
image that looks like it has an effective resolution of 1920x1080 but
scaled up in an irregular way. Reverting the commit or applying this
patch fixes the problem on v5.14-rc1.
Fixes: 72a7cf0aec ("drm/amd/display: Keep linebuffer pixel depth at 30bpp for DCE-11.0.")
Acked-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Liviu Dudau <liviu@dudau.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull drm fixes from Dave Airlie:
"Regular rc2 fixes though a bit more than usual at rc2 stage, people
must have been testing early or else some fixes from last week got a
bit laggy.
There is one larger change in the amd fixes to amalgamate some power
management code on the newer chips with the code from the older chips,
it should only affects chips where support was introduced in rc1 and
it should make future fixes easier to maintain probably a good idea to
merge it now.
Otherwise it's mostly fixes across the board.
dma-buf:
- Fix fence leak in sync_file_merge() error code
drm/panel:
- nt35510: Don't fail on DSI reads
fbdev:
- Avoid use-after-free by not deleting current video mode
ttm:
- Avoid NULL-ptr deref in ttm_range_man_fini()
vmwgfx:
- Fix a merge commit
qxl:
- fix a TTM regression
amdgpu:
- SR-IOV fixes
- RAS fixes
- eDP fixes
- SMU13 code unification to facilitate fixes in the future
- Add new renoir DID
- Yellow Carp fixes
- Beige Goby fixes
- Revert a bunch of TLB fixes that caused regressions
- Revert an LTTPR display regression
amdkfd
- Fix VRAM access regression
- SVM fixes
i915:
- Fix -EDEADLK handling regression
- Drop the page table optimisation"
* tag 'drm-fixes-2021-07-16' of git://anongit.freedesktop.org/drm/drm: (29 commits)
drm/amdgpu: add another Renoir DID
drm/ttm: add a check against null pointer dereference
drm/i915/gtt: drop the page table optimisation
drm/i915/gt: Fix -EDEADLK handling regression
drm/amd/pm: Add waiting for response of mode-reset message for yellow carp
Revert "drm/amdkfd: Add heavy-weight TLB flush after unmapping"
Revert "drm/amdgpu: Add table_freed parameter to amdgpu_vm_bo_update"
Revert "drm/amdkfd: Make TLB flush conditional on mapping"
Revert "drm/amdgpu: Fix warning of Function parameter or member not described"
Revert "drm/amdkfd: Add memory sync before TLB flush on unmap"
drm/amd/pm: Fix BACO state setting for Beige_Goby
drm/amdgpu: Restore msix after FLR
drm/amdkfd: Allow CPU access for all VRAM BOs
drm/amdgpu/display - only update eDP's backlight level when necessary
drm/amdkfd: handle fault counters on invalid address
drm/amdgpu: Correct the irq numbers for virtual crtc
drm/amd/display: update header file name
drm/amd/pm: drop smu_v13_0_1.c|h files for yellow carp
drm/amd/display: remove faulty assert
Revert "drm/amd/display: Always write repeater mode regardless of LTTPR"
...
Commit 72a7cf0aec ("drm/amd/display: Keep linebuffer pixel depth at
30bpp for DCE-11.0.") doesn't seems to have fixed 10bit 4K rendering over
DisplayPort for CIK GPUs. On my machine with a HAWAII GPU I get a broken
image that looks like it has an effective resolution of 1920x1080 but
scaled up in an irregular way. Reverting the commit or applying this
patch fixes the problem on v5.14-rc1.
Fixes: 72a7cf0aec ("drm/amd/display: Keep linebuffer pixel depth at 30bpp for DCE-11.0.")
Acked-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Liviu Dudau <liviu@dudau.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. using vram aper to access vram if possible
2. avoid MM_INDEX/MM_DATA is not working when mmio protect feature is
enabled.
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
split amdgpu_device_access_vram()
1. amdgpu_device_mm_access(): using MM_INDEX/MM_DATA to access vram
2. amdgpu_device_aper_access(): using vram aperature to access vram (option)
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Don't populate the const array common_rates on the stack but instead it
static. Makes the object code smaller by 80 bytes:
Before:
text data bss dec hex filename
268019 98322 256 366597 59805 ../display/amdgpu_dm/amdgpu_dm.o
After:
text data bss dec hex filename
267843 98418 256 366517 597b5 ../display/amdgpu_dm/amdgpu_dm.o
Reduction of 80 bytes
(gcc version 10.3.0)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>