We employ a "waitboost" heuristic to detect when userspace is stalled
waiting for results from earlier execution. Under latency sensitive work
mixed between the gpu/cpu, the GPU is typically under-utilised and so
RPS sees that low utilisation as a reason to downclock the frequency,
causing longer stalls and lower throughput. The user left waiting for
the results is not impressed.
On applying commit 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv
workaround") it was observed that deinterlacing h264 on Haswell
performance dropped by 2-5x. The reason being that the natural workload
was not intense enough to trigger RPS (using HW evaluation intervals) to
upclock, and so it was depending on waitboosting for the throughput.
Commit 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv workaround")
changes the composition of dma-resv from keeping a single write fence +
multiple read fences, to a single array of multiple write and read
fences (a maximum of one pair of write/read fences per context). The
iteration order was also changed implicitly from all-read fences then
the single write fence, to a mix of write fences followed by read
fences. It is that ordering change that belied the fragility of
waitboosting.
Currently, a waitboost is inspected at the point of waiting on an
outstanding fence. If the GPU is backlogged such that we haven't yet
stated the request we need to wait on, we force the GPU to upclock until
the completion of that request. By changing the order in which we waited
upon requests, we ended up waiting on those requests in sequence and as
such we saw that each request was already started and so not a suitable
candidate for waitboosting.
Instead of asking whether to boost each fence in turn, we can look at
whether boosting is required for the dma-resv ensemble prior to waiting
on any fence, making the heuristic more robust to the order in which
fences are stored in the dma-resv.
Reported-by: Thomas Voegtle <tv@lio96.de>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Fixes: 047a1b877e ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
Tested-by: Thomas Voegtle <tv@lio96.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/07e05518d9f6620d20cc1101ec1849203fe973f9.1657289332.git.karolina.drobnik@intel.com
[Why]
When playing NV12 1080p MPO video, it is pipe splitting so
we see two pipes in fullscreen and four pipes in windowed
mode. Pipe split is happening because we are setting
MaximumMPCCombine = 1
[How]
Algorithm for MaximumMPCCombine has extra conditions we do
not need. Use DCN31 algorithm instead
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Black screen encountered when disabling Freesync through OSD on some
displays.
[How]
Set the should_disable flag when new top pipe has no plane state to
ensure that pipes get cleaned up.
Reviewed-by: Chris Park <Chris.Park@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
When unplug one sst monitor from a mst hub and plug in the same
port with another sst monitor, we don't read the corresponding
edid. That's because we detect there is already an edid stored in
aconnector->edid which is a stale one.
[How]
Clean up aconnector->edid when unplug mst connector.
Reviewed-by: Hersen Wu <hersen.wu@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
In single display configuration, windowed MPO does not work
with ODM combine.
[How]
For ODM + MPO window on one half of ODM, only 3 pipes should
be allocated and scaling parameters adjusted to handle this case.
Otherwise, we use 4 pipes.
Move copy_surface_update_to_plane() before dc_add_plane_to_context()
so that it gets the updated rect information when setting up
the pipes.
Add dc_check_boundary_crossing_for_windowed_mpo_with_odm() to force
a full update when we cross a boundary requiring us to reconfigure
the number of pipes between 3 and 4 pipes.
Set config.enable_windowed_mpo_odm to true when we have the
debug.enable_single_display_2to1_odm_policy set to true.
Don't fail validating ODM with windowed MPO if
config.enable_windowed_mpo_odm is true.
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Description]
Exit SubVP if MPO is in use since SubVP + MPO together is not supported.
- Don't add SubVP at validation time if we see MPO is in use
Issues fixed in the SubVP / MPO transition:
1. Enable phantom pipes in post unlock function to prevent underflow
when an active pipe is being transitioned to be a phantom pipe (VTG
updates take place right away). Also must wait for VUPDATE of the main
pipe to complete first
2. Don't wait for MPCC idle when transitioning a phantom pipe to an
actual pipe. MPCC_STATUS is never asserted due to OTG being off for
phantom pipes
3. When transitioning an active pipe to phantom, program DET right away
(same as disabling the pipe) or the DET update will only take when
the phantom pipe is enabled which can cause DET allocation errors.
4. For K1/K2 programming of phantom pipes, use same settings as the
main pipe. Also don't program K1 / K2 = 0xF ever since the field is only
1 / 2 bits wide.
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For GMC 10 parts which support scatter/gather display (display
from system memory), we should allocate a larger gart size
to better handler larger displays. This mirrors what we already
do for GMC 9 parts.
v2: fix typo (Alex)
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Need reserve buffers before unmap mes ctx bo va.
v2: fix removal of dma_resv_excl_fence() (Alex)
v3: fix dma_resv_usage (Alex)
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mediatek DRM Next for Linux 5.20
1. Add Mediatek Soc DRM (vdosys0) support for mt8195
2. Cooperate with DSI RX devices to modify dsi funcs and delay mipi high to cooperate with panel sequence
3. Add mt8186 dsi compatible and convert dsi_dtbinding to .yaml
4. Add MediaTek SoC DRM (vdosys1) support for mt8195
5. Add MT8195 dp_intf driver
Signed-off-by: Dave Airlie <airlied@redhat.com>
[airlied: fix drm_edid.h include]
From: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220709142021.24260-1-chunkuang.hu@kernel.org
drm/tegra: Changes for v5.20-rc1
The bulk of these changes adds support for context isolation for the
various supported host1x engines, as well as support for the hardware
found on the new Tegra234 SoC generation.
There's also a couple of fixes and cleanups. To round things off, the
device tree bindings are converted to the new json-schema format that
allows DTBs to be validated.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220708181136.673789-1-thierry.reding@gmail.com
In exynos7_decon_resume, When it fails, we must use clk_disable_unprepare()
to free resource that have been used.
Fixes: 6f83d20838 ("drm/exynos: use DRM_DEV_ERROR to print out error
message")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jian Zhang <zhangjian210@huawei.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
drm-misc-next for $kernel-version:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
* crtc: Remove unnessary include statements from drm_crtc.h, plus
fallout in drivers
* edid: More use of struct drm_edid; implement HF-EEODB extension
Driver Changes:
* bridge:
* anx7625: Implement HDP timeout via callback; Cleanups
* fsl-ldb: Drop DE flip; Modesetting fixes
* imx: Depend on ARCH_MXC
* sil8620: Fix off-by-one
* ti-sn65dsi86: Convert to atomic modesetting
* ingenic: Fix display at maximum resolution
* panel:
* simple: Add support for HannStar HSD101PWW2, plus DT bindings; Add
support for ETML0700Y5DHA, plus DT bindings
* rockchip: Fixes
* vc4: Cleanups
* vmwgfx: Cleanups
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YsaHq1pvE699NtOM@linux-uq9g
Always run fbdev removal first to remove simpledrm via sysfb_disable().
This clears the internal state.
The later call to drm_aperture_detach_drivers() then does nothing.
Otherwise, with drm_aperture_detach_drivers() running first, the call to
sysfb_disable() uses inconsistent state.
Example backtrace show below:
BUG: KASAN: use-after-free in device_del+0x79/0x5f0
Read of size 8 at addr ffff888108185050 by task systemd-udevd/311
CPU: 0 PID: 311 Comm: systemd-udevd Tainted: G E 5.19.0-rc2-1-default+ #1689
Hardware name: HP ProLiant DL120 G7, BIOS J01 04/21/2011
Call Trace:
device_del+0x79/0x5f0
platform_device_del.part.0+0x19/0xe0
platform_device_unregister+0x1c/0x30
sysfb_disable+0x2d/0x70
remove_conflicting_framebuffers+0x1c/0xf0
remove_conflicting_pci_framebuffers+0x130/0x1a0
drm_aperture_remove_conflicting_pci_framebuffers+0x86/0xb0
mgag200_pci_probe+0x2d/0x140 [mgag200]
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 873eb3b118 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Changcheng Deng <deng.changcheng@zte.com.cn>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For some cases (accessing registers, unmap legacy queue), it needs
access mes in atomic context. Use spinlock to protect agaist mes
ring buffer race condition.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why&How]
Output Data Mapping is a power saving feature that allows us to run at
reduced DPP and DISP clocks compared to what could be achieved with a
single pipe.
Set the default policy for single display use case to use 2 to 1 ODM combine.
The options are queried by DC and appropriate register programming sequence
is initiated to enable this feature.
Fixes: 235c676342 ("drm/amd/display: add DCN32/321 specific files for Display Core")
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why&How]
Add a missing callback to set DIG FIFO output pixel mode. This is used
when ODM combine is activated.
Fixes: 235c676342 ("drm/amd/display: add DCN32/321 specific files for Display Core")
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Although all DSS belong to a single pool on Xe_HP platforms (i.e.,
they're not organized into slices from a topology point of view), we do
still need to pass 'group' and 'instance' targets when steering register
accesses to a specific instance of a per-DSS multicast register. The
rules for how to determine group and instance IDs (which previously used
legacy terms "slice" and "subslice") varies by platform. Some platforms
determine steering by gslice membership, some platforms by cslice
membership, and future platforms may have other rules.
Since looping over each DSS and performing steered unicast register
accesses is a relatively common pattern, let's add a dedicated iteration
macro to handle this (and replace the platform-specific "instdone" loop
we were using previously. This will avoid the calling code needing to
figure out the details about how to obtain steering IDs for a specific
DSS.
Most of the places where we use this new loop are in the GPU errorstate
code at the moment, but we do have some additional features coming in
the future that will also need to loop over each DSS and steer some
register accesses accordingly.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220701232006.1016135-1-matthew.d.roper@intel.com
There are several things wrong here. First, none of these
numbers are FP, so there is no need to cast to double. Next
make sure to use proper 64 bit division helpers.
Fixes: 85f4bc0c33 ("drm/amd/display: Add SubVP required code")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use the devm_platform_ioremap_resource() helper instead of calling
platform_get_resource() and devm_ioremap_resource() separately. Make the
code simpler without functional changes.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.
It is less verbose and it improves the semantic.
While at it, remove a useless bitmap_zero() call. The bitmap is already
zero'ed when allocated.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Thierry Reding <treding@nvidia.com>
host1x_cdma_push_wide() had the assumptions that the last parameter word
was a NOP opcode, and that NOP opcodes could be used in all situations.
Neither are true with the new job opcode sequence, so adjust the
function to not have these assumptions, and instead place an early
RESTART opcode when necessary to jump back to the beginning of the
pushbuffer.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
During the refactoring of channel_submit(), assignment of syncval was
moved but it is also used in channel_submit(). Add this assignment back
to channel_submit() as well.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The global sseu config is applicable only to gen11 platforms where
concurrent media, render and OA use cases may cause some subslices to be
turned off and hence lose NOA configuration. Ideally we want to return
ENODEV for non-gen11 platforms, however, this has shipped with gfx12, so
disable only for gfx12.50+.
v2: gfx12 is already shipped with this, disable for gfx12.50+ (Lionel)
v3: (Matt)
- Update commit message and replace "12.5" with "12.50"
- Replace DRM_DEBUG() with driver specific drm_dbg()
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220707193002.2859653-2-umesh.nerlige.ramappa@intel.com
Even though the IOVA API never actually needed it, iova.h is still
carrying an include of dma-mapping.h, now solely for the sake of not
breaking tegra-drm. Fix that properly.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The code assumes that Tegra GEM is permanently vmapped, which is not
true for the scattered buffers. After converting Tegra video decoder
driver to V4L API, we're now getting a BUG_ON from dma-buf core on playing
video using libvdpau-tegra on T30+ because tegra_gem_prime_vmap() sets
vaddr to NULL. Older pre-V4L video decoder driver wasn't vmapping dma-bufs.
Fix it by actually vmapping the exported GEMs.
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
drivers/gpu/drm/tegra/vic.c:326:12: error: ‘vic_runtime_suspend’ defined but not used [-Werror=unused-function]
static int vic_runtime_suspend(struct device *dev)
^~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/tegra/vic.c:292:12: error: ‘vic_runtime_resume’ defined but not used [-Werror=unused-function]
static int vic_runtime_resume(struct device *dev)
^~~~~~~~~~~~~~~~~~
Mark it as __maybe_unused.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Conditional registration is a problem for other subsystems which may
unwittingly try to interact with host1x_context_device_bus_type in an
uninitialised state on non-Tegra platforms. A look under /sys/bus on a
typical system already reveals plenty of entries from enabled but
otherwise irrelevant configs, so lets keep things simple and register
our context bus unconditionally too.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Add Tegra234 support for VIC. It is backwards compatible with
Tegra194.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
When MLOCK enforcement is enabled, the 0-word write currently done
is rejected by the hardware outside of an MLOCK region. As such,
on these chips, which also have the newer, more convenient RESTART_W
opcode, use that instead to skip over the timed out job.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
With the full-featured opcode sequence using MLOCKs, we need to also
unlock those MLOCKs in the event of a timeout. However, it turns out
that on Tegra186/Tegra194, by default, we don't need to do this;
furthermore, on Tegra234 it is much simpler to do; so only implement
this on Tegra234 for the time being.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
For new (Tegra186+) SoCs, use a new ('full-featured') job opcode
sequence that is compatible with virtualization. In particular,
the Host1x hardware in Tegra234 is more strict regarding the sequence,
requiring ACQUIRE_MLOCK-SETCLASS-SETSTREAMID opcodes to occur in
that sequence without gaps (except for SETPAYLOAD), so let's do it
properly in one go now.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>