All these ROUNDING_FACTORs and whatnot are making this thing hard to
read. Get rid of them. And let's massage some of the fractions to
give us less questionable intermediate results and perhaps less
divisions.
Also looks like a good helping of 64bit math stuff is needed to
avoid some of overflows present in the current code. There
might still be a few overflows, namely when calculating
link_clks_available/samples_room (would require a huge hblank
though), and potentially when calculating hblank_rise (not sure
how large link_clks_active can get).
It looks like we're still not calculating exactly what the spec says
since we truncate tu_data and tu_line early. But I'm too lazy to
figure out if we could avoid that.
v2: Fix typo in commit msg (Uma)
Remove ROUNDING_FACTOR define (Uma)
s/5*link_clk+5*cdclk/5*(link_clk+cdclk)/ (Chris)
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429185457.26235-3-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Display WA #1105 says that FBC requires PLANE_STRIDE to be a multiple
of 512 bytes on gen9 and glk.
This is definitely true for glk as certain tests (such as
igt/kms_big_fb/linear-16bpp-rotate-0) are now failing when the
display resolution results in a plane stride which is not a
multiple of 512 bytes.
Curiously I was not able to reproduce this on a KBL. First I
suspected that our use of the FBC override stride explain this,
but after trying to use the override stride on glk the test
still failed. I did try both the old CHICKEN_MISC_4 way and
the new FBC_STRIDE way, neither had any effect on the result.
Anyways, we need this at least on glk. But let's trust the spec
and apply the w/a for all gen9 as well, despite being unable to
reproduce the problem.
v2: s/FBC_CHICKEN/FBC_STRIDE/ in commit msg
Cc: José Roberto de Souza <jose.souza@intel.com>
Fixes: 691f7ba58d ("drm/i915/display/fbc: Make fences a nice-to-have for GEN9+")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429101034.8208-2-ville.syrjala@linux.intel.com
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
We need to calculate SAGV mask also in a non-modeset
commit, however currently active_pipes are only calculated
for modesets in global atomic state, thus now we will be
tracking those also in bw_state in order to be able to
properly access global data.
v2: - Removed pre/post plane SAGV updates from modeset(Ville)
- Now tracking active pipes in intel_can_enable_sagv(Ville)
v3: - lock global state if active_pipes change as well(Ville)
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430195634.7666-1-stanislav.lisovskiy@intel.com
Future platforms require per-crtc SAGV evaluation
and serializing global state when those are changed
from different commits.
v2: - Add has_sagv check to intel_crtc_can_enable_sagv
so that it sets bit in reject mask.
- Use bw_state in intel_pre/post_plane_enable_sagv
instead of atomic state
v3: - Fixed rebase conflict, now using
intel_atomic_crtc_state_for_each_plane_state in
order to call it from atomic check
v4: - Use fb modifier from plane state
v5: - Make intel_has_sagv static again(Ville)
- Removed unnecessary NULL assignments(Ville)
- Removed unnecessary SAGV debug(Ville)
- Call intel_compute_sagv_mask only for modesets(Ville)
- Serialize global state only if sagv results change, but
not mask itself(Ville)
v6: - use lock global state instead of serialize(Ville)
v7: - use both global state lock and serialize depending on
if we need to change only global state or access hw
(Ville)
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Cc: Ville Syrjälä <ville.syrjala@intel.com>
Cc: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430191757.18206-1-stanislav.lisovskiy@intel.com
The older arches did not convert MI_STORE_DATA_IMM to using the GTT, but
left them writing to a physical address. The notes suggest that the
primary reason would be so that the writes were cache coherent, as the
CPU cache uses physical tagging. As such we did not implement the
legacy variant of MI_STORE_DATA_IMM and so left all the relocations
synchronous -- but with a small function to convert from the vma address
into the physical address, we can implement asynchronous relocs on these
older arches, fixing up a few tests that require them.
In order to be able to test the legacy paths, refactor the gpu
relocations so that we can hook them up to a selftest.
v2: Use an array of offsets not enum labels for the selftest
v3: Refactor the common igt_hexdump()
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/757
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200504140629.28240-1-chris@chris-wilson.co.uk
Older compilers warn about initializers with incorrect curly
braces:
drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c: In function 'sun6i_dsi_encoder_enable':
drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c:720:8: error: missing braces around initializer [-Werror=missing-braces]
union phy_configure_opts opts = { 0 };
^
drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c:720:8: error: (near initialization for 'opts.mipi_dphy') [-Werror=missing-braces]
Use the GNU empty initializer extension to avoid this.
Fixes: bb3b6fcb68 ("sun6i: dsi: Convert to generic phy handling")
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428215105.3928459-1-arnd@arndb.de
If at first we don't succeed, try try again.
Not all engines may support the MI ops we need to perform asynchronous
relocation patching, and so we end up falling back to a synchronous
operation that has a liability of blocking. However, Tvrtko pointed out
we don't need to use the same engine to perform the relocations as we
are planning to execute the execbuf on, and so if we switch over to a
working engine, we can perform the relocation asynchronously. The user
execbuf will be queued after the relocations by virtue of fencing.
This patch creates a new context per execbuf requiring asynchronous
relocations on an unusable engines. This is perhaps a bit excessive and
can be ameliorated by a small context cache, but for the moment we only
need it for working around a little used engine on Sandybridge, and only
if relocations are actually required to an active batch buffer.
Now we just need to teach the relocation code to handle physical
addressing for gen2/3, and we should then have universal support!
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Testcase: igt/gem_exec_reloc/basic-spin # snb
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200501192945.22215-3-chris@chris-wilson.co.uk
The ring is a precious resource: we anticipate to only use a few hundred
bytes for a request, and only try to reserve that before we start. If we
go beyond our guess in building the request, then instead of waiting at
the start of execbuf before we hold any locks or other resources, we
may trigger a wait inside a critical region. One example is in using gpu
relocations, where currently we emit a new MI_BB_START from the ring
every time we overflow a page of relocation entries. However, instead of
insert the command into the precious ring, we can chain the next page of
relocation entries as MI_BB_START from the end of the previous.
v2: Delay the emit_bb_start until after all the chained vma
synchronisation is complete. Since the buffer pool batches are idle, this
_should_ be a no-op, but one day we may some fancy async GPU bindings
for new vma!
v3: Use pool/batch consitently, once we start thinking in terms of the
batch vma, use batch->obj.
v4: Explain the magic number 4.
Tvrtko spotted that we lose propagation of the error for failing to
submit the relocation request; that's easier to fix up in the next
patch.
Testcase: igt/gem_exec_reloc/basic-many-active
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200501192945.22215-1-chris@chris-wilson.co.uk
Create gfx_v10_0_rlc_funcs_sriov for nv12 with rlcg_write function pointers be initialized
so driver can use RLCG to write aceess CSIB and CP_ME_CNTL registers when nv12 in sriov mode
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The queue mask used for set_resources always assumes the queue number
per pipe is 8, so KFD needs to align with that by using function
amdgpu_queue_mask_bit_to_set_resource_bit().
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rename it to amdgpu_queue_mask_bit_to_set_resource_bit() to be more
specific about its functionality. KFD will use it later.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
IP discovery is only supported in Navi series and onwards.
There is no need to reserve a portion of vram as discovery
tmr region for pre-Navi adapters.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use a common method to set queue mask before set kiq resource.
The value of queue mask must suitablt for the designated form.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rename hardware IP name from UVD to VCN to reduce confusion.
Hardware IP name UVD and VCN are equivalent for VCN2.5 asics.
Use name VCN for future VCN based asics.
V2: update description
V3: rebase
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: James Zhu <james.zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With previous golden settings, compute task can't use
reserved LDS (32K) on CU0 and CU1. On 64K LDS system,
if compute work group allocate more than 32K LDS, then
it can't be dispatched to CU0 and CU1 because of the
reservation. This enables compute task to use reserved
LDS on CU0 and CU1.
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
PCI domain has moved to 32-bits to accommodate virtualization,
so a 32-bit integer is exposed for domain to reflect this change.
Domain can be found in here:
/sys/class/kfd/kfd/topology/nodes/X/properties
Where X is the card number
Signed-off-by: Ori Messinger <ori.messinger@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The dcn20_validate_bandwidth function would have code touching the
incorrect registers emitted outside of the boundaries of the
DC_FP_START/END macros, at least on ppc64le. Work around the
problem by wrapping the whole function instead.
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.6.x
Once should generally be enough for diagnosing what lead up to it,
repeating it over and over can be pretty annoying.
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CG/PG ungate is already performed in ip_suspend_phase1. Otherwise,
the CG/PG ungate will be performed twice. That will cause gfxoff
disablement is performed twice also on runpm enter while gfxoff
enablemnt once on rump exit. That will put gfxoff into disabled
state.
Fixes: b2a7e9735a ("drm/amdgpu: fix the hw hang during perform system reboot and reset")
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
This sequence change should be safe as what did in ip_suspend_phase1
is to suspend DCE only. And this is a prerequisite for coming
redundant cg/pg ungate dropping.
Fixes: 487eca11a3 ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)")
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/gpu/drm/amd/display/dc/dcn10/dcn10_stream_encoder.c:1277:11: warning: variable ‘speakers’ set but not used [-Wunused-but-set-variable]
It is introduced by commit 0c41891c81 ("drm/amd/display:
Refactor stream encoder for HW review"), but never used, so remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>