Commit Graph

1075420 Commits

Author SHA1 Message Date
Alex Deucher
45a3e06be4 drm/amdgpu: Use IP versions in convert_tiling_flags_to_modifier()
Rather than checking the asic_type.

Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:07 -05:00
Dillon Varone
fe5e8f07fc drm/amd/display: Modify plane removal sequence to avoid hangs.
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:07 -05:00
Danijel Slivka
7952fa0d3e drm/amd/pm: new v3 SmuMetrics data structure for Sienna Cichlid
structure changed in smc_fw_version >= 0x3A4900,
    "uint16_t VcnActivityPercentage" replaced with
    "uint16_t VcnUsagePercentage0" and "uint16_t VcnUsagePercentage1"

Signed-off-by: Danijel Slivka <danijel.slivka@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:07 -05:00
Prike Liang
d7709eb6a1 drm/amdgpu: enable gfxoff routine for GC 10.3.7
Enable gfxoff routine for GC 10.3.7.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:07 -05:00
Prike Liang
fabe175385 drm/amdgpu: enable gfx power gating for GC 10.3.7
Enable gfx power gating for GC 10.3.7.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Prike Liang
9a1358bb2c drm/amdgpu/nv: enable clock gating for GC 10.3.7 subblock
This will enable the following block clock gating.

 - MC
 - SDMA
 - HDP
 - ATHUB
 - IH
 - VCN/JPEG

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Prike Liang
00bfab4457 drm/amdgpu: enable gfx clock gating control for GC 10.3.7
Enable gfx cg gate/ungate control for GC 10.3.7.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Qiang Yu
b6901d93cc drm/amdgpu: fix suspend/resume hang regression
Regression has been reported that suspend/resume may hang with
the previous vm ready check commit.

So bring back the evicted list check as a temp fix.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1922
Fixes: c1a66c3bc4 ("drm/amdgpu: check vm ready by amdgpu_vm->evicting flag")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Qiang Yu <qiang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Yifan Zha
e6fac6a9c9 drm/amdgpu: Move CAP firmware loading to the beginning of PSP firmware list
[Why]
As PSP needs to verify the signature, CAP firmware must be loaded first when PSP loads firmwares.
Otherwise, when DFC feature is enabled, CP firmwares would be loaded failed.

[ 1149.160480] [drm] MM table gpu addr = 0x800022f000, cpu addr = 00000000a62afcea.
[ 1149.209874] [drm] failed to load ucode CP_CE(0x8)
[ 1149.209878] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)
[ 1149.215914] [drm] failed to load ucode CP_PFP(0x9)
[ 1149.215917] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)
[ 1149.221941] [drm] failed to load ucode CP_ME(0xA)
[ 1149.221944] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)
[ 1149.228082] [drm] failed to load ucode CP_MEC1(0xB)
[ 1149.228085] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)
[ 1149.234209] [drm] failed to load ucode CP_MEC2(0xD)
[ 1149.234212] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)
[ 1149.242379] [drm] failed to load ucode VCN(0x1C)
[ 1149.242382] [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0007)

[How]
Move CAP UCODE ID to the beginning of AMDGPU_UCODE_ID enum list.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Bokun Zhang <Bokun.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Andrey Grodzovsky
5aa061474b drm/amdgpu: Bump minor version for hot plug tests enabling.
This will allow to enable the tests only after latest fix
after which the tests passed on my system.

I tested on NV21 standalone and Vega 10 and Polaris as
pair with DRI_PRIME.

It's possible there might be still issues on ASICs i don't
have at my posession but that that the point of enbling
the tests finally - if other people during testing will
encounter errors they will report and I will be able to fix.

The releated merge request for enabling libdrm tests suite  is in
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/227

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Andrey Grodzovsky
57230f0ce6 drm/amdgpu: Fix sigsev when accessing MMIO on hot unplug.
Protect with drm_dev_enter/exit

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Yifan Zhang
7d4108e4ce drm/amdgpu: convert code name to ip version for noretry set
Use IP version rather than codename for noretry set.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
Yifan Zhang
957b0787ee drm/amdgpu: move amdgpu_gmc_noretry_set after ip_versions populated
otherwise adev->ip_versions is still empty when amdgpu_gmc_noretry_set
is called.

Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
80e0c2cb37 drm/amdgpu: Remove redundant .ras_fini initialization in some ras blocks
1. Define amdgpu_ras_block_late_fini_default in amdgpu_ras.c as
   .ras_fini common function, which is called when
   .ras_fini of ras block isn't initialized.
2. Remove the code of using amdgpu_ras_block_late_fini to
   initialize .ras_fini in ras blocks.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
30e58102d5 drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in mca ras block
Remove redundant calls of amdgpu_ras_block_late_fini in mca ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
149d7ba1f8 drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in sdma ras block
Remove redundant calls of amdgpu_ras_block_late_fini in sdma ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
aa8e65dfc7 drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in hdp ras block
Remove redundant calls of amdgpu_ras_block_late_fini in hdp ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
f148c143ef drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in xgmi ras block
Remove redundant calls of amdgpu_ras_block_late_fini in xgmi ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
0dca257d6d drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in umc ras block
Remove redundant calls of amdgpu_ras_block_late_fini in umc ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
f578a37d19 drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in nbio ras block
Remove redundant calls of amdgpu_ras_block_late_fini in nbio ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:06 -05:00
yipechai
9dad47c50f drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in mmhub ras block
Remove redundant calls of amdgpu_ras_block_late_fini in mmhub ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
yipechai
35366481d0 drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in gfx ras block
Remove redundant calls of amdgpu_ras_block_late_fini in gfx ras block.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
yipechai
1f211a827c drm/amdgpu: centrally calls the .ras_fini function of all ras blocks
centrally calls the .ras_fini function of all ras blocks.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
yipechai
667c7091a3 drm/amdgpu: Optimize xxx_ras_fini function of each ras block
1. Move the variables of ras block instance members from
   specific xxx_ras_fini to general ras_fini call.
2. Function calls inside the modules only use parameters
   passed from xxx_ras_fini instead of ras block instance
   members.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
yipechai
01d468d9a4 drm/amdgpu: Modify .ras_fini function pointer parameter
Modify .ras_fini function pointer parameter so that
we can remove redundant intermediate calls in some
ras blocks.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Shah Dharati
b51759661e drm/amd/display: Adding a dc_debug option and dmub setting to use PHY FSM for PSR
[Why]
PSR Power on/off is done in PSR. Add a dc_debug option
and dmub setting to use PHY implementation of this instead.

[How]
Add a dc_debug option and dmub setting to use
PHY FSM Power up/down for PSR.

Co-authored-by: Shah Dharati <dharati.shah@amd.com>
Reviewed-by: Hansen Dsouza <hansen.dsouza@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Shah Dharati <dharati.shah@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Dillon Varone
91dcfe5fd9 drm/amd/display: Add frame alternate 3D & restrict HW packed on dongles
[WHY?]
Some projectors support frame alternate 3D modes at 120Hz, but DAL3 does
not create timings. Most active DP to HDMI dongles do not translate
infoframes properly to use HW packing stereo mode.

[HOW?]
Create frame alternate 3D timings for displays that support it. Disable HW
packing 3D mode on DP active dongles.

Reviewed-by: Martin Leung <Martin.Leung@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Tom Rix
ca6fcfa8d4 drm/amdgpu: Fix realloc of ptr
Clang static analysis reports this error
amdgpu_debugfs.c:1690:9: warning: 1st function call
  argument is an uninitialized value
  tmp = krealloc_array(tmp, i + 1,
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~

realloc uses tmp, so tmp can not be garbage.
And the return needs to be checked.

Fixes: 5ce5a584cb ("drm/amdgpu: add debugfs for reset registers list")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Chris Park
4affb12303 drm/amd/display: Reset VIC if HDMI_VIC is present
[Why]
HDMI Compliance requires VIC to be set to 0
on 2D mode if HDMI_VIC is present.

[How]
When VIC and HDMI_VIC is both present,
reset VIC to 0.

Reviewed-by: Martin Leung <Martin.Leung@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Chris Park <Chris.Park@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Nicholas Kazlauskas
6dc0fded62 drm/amd/display: Make functional resource functions non-static
[Why & How]
To align coding style for how we use this across DCN. The resource
creation ones can remain static, however.

Reviewed-by: Eric Yang <Eric.Yang2@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Hansen Dsouza
1e242bf8bc drm/amd/display: Remove invalid RDPCS Programming in DAL
RDPCS programming is done in DMUB remove legacy invalid code

Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Acked-by: Alan Liu <HaoPing.Liu@amd.com>
Signed-off-by: Hansen Dsouza <Hansen.Dsouza@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Philip Yang
d58b8a99cb drm/amdkfd: Add SMI add event helper
To remove duplicate code, unify event message format and simplify new
event add in the following patches.

Use KFD_SMI_EVENT_MSG_SIZE to define msg size, the same size will be
used in user space to alloc the msg receive buffer.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Philip Yang
38abd56bed drm/amdkfd: Correct SMI event read size
sizeof(buf) is 8 bytes because it is defined as unsigned char *buf,
each SMI event read only copy max 8 bytes to user buffer. Correct this
by using the buf allocate size.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Philip Yang
e433d68433 Revert "drm/amdkfd: process_info lock not needed for svm"
This reverts commit 3abfe30d80.

To fix deadlock in kFDSVMEvictTest when xnack off.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Meng Tang
b8cb6ab686 gpu/amd: vega10_hwmgr: fix inappropriate private variable name
In file vega10_hwmgr.c, the names of struct vega10_power_state *
and struct pp_power_state * are confusingly used, which may lead
to some confusion.

Status quo is that variables of type struct vega10_power_state *
are named "vega10_ps", "ps", "vega10_power_state". A more
appropriate usage is that struct are named "ps" is used for
variabled of type struct pp_power_state *.

So rename struct vega10_power_state * which are named "ps" and
"vega10_power_state" to "vega10_ps", I also renamed "psa" to
"vega10_psa" and "psb" to "vega10_psb" to make it more clearly.

The rows longer than 100 columns are involved.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Meng Tang <tangmeng@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:40:05 -05:00
Luben Tuikov
52e8da704d drm/amd/display: Don't fill up the logs
Don't fill up the logs with:

[253557.859575] [drm:amdgpu_dm_atomic_check [amdgpu]] DSC precompute is not needed.
[253557.892966] [drm:amdgpu_dm_atomic_check [amdgpu]] DSC precompute is not needed.
[253557.926070] [drm:amdgpu_dm_atomic_check [amdgpu]] DSC precompute is not needed.
[253557.959344] [drm:amdgpu_dm_atomic_check [amdgpu]] DSC precompute is not needed.

which prints many times a second, when the kernel is run with
drm.debug=2.

Instead of DRM_DEBUG_DRIVER(), make it DRM_INFO_ONCE().

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Roman Li <Roman.Li@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Hersen Wu <hersenwu@amd.com>
Cc: Daniel Wheeler <daniel.wheeler@amd.com>
Fixes: 17ce8a6907 ("drm/amd/display: Add dsc pre-validation in atomic check")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02 18:38:51 -05:00
Tommy Haung
e41d27eaf5 drm/aspeed: Add AST2600 chip support
Add AST2600 chip support and setting.

Signed-off-by: Tommy Haung <tommy_huang@aspeedtech.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302024930.18758-5-tommy_huang@aspeedtech.com
2022-03-03 09:08:35 +10:30
Tommy Haung
5e2421ce79 drm/aspeed: Update INTR_STS handling
Add interrupt clear register define for further chip support.

Signed-off-by: Tommy Haung <tommy_huang@aspeedtech.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302024930.18758-4-tommy_huang@aspeedtech.com
2022-03-03 09:08:35 +10:30
Thomas Zimmermann
9ae2ac4d31 drm: Add TODO item for optimizing format helpers
Add a TODO item for optimizing blitting and format-conversion helpers
in DRM and fbdev. There's always demand for faster graphics output.

v4:
	* fix typos (Sam)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220223193804.18636-6-tzimmermann@suse.de
2022-03-02 20:26:02 +01:00
Thomas Zimmermann
0d03011894 fbdev: Improve performance of cfb_imageblit()
Improve the performance of cfb_imageblit() by manually unrolling
the inner blitting loop and moving some invariants out. The compiler
failed to do this automatically. This change keeps cfb_imageblit()
in sync with sys_imagebit().

A microbenchmark measures the average number of CPU cycles
for cfb_imageblit() after a stabilizing period of a few minutes
(i7-4790, FullHD, simpledrm, kernel with debugging).

cfb_imageblit(), new: 15724 cycles
cfb_imageblit(): old: 30566 cycles

In the optimized case, cfb_imageblit() is now ~2x faster than before.

v3:
	* fix commit description (Pekka)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220223193804.18636-5-tzimmermann@suse.de
2022-03-02 20:22:33 +01:00
Thomas Zimmermann
3c54c95bd9 fbdev: Remove trailing whitespaces from cfbimgblt.c
Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220223193804.18636-4-tzimmermann@suse.de
2022-03-02 20:22:06 +01:00
Thomas Zimmermann
6f29e04938 fbdev: Improve performance of sys_imageblit()
Improve the performance of sys_imageblit() by manually unrolling
the inner blitting loop and moving some invariants out. The compiler
failed to do this automatically. The resulting binary code was even
slower than the cfb_imageblit() helper, which uses the same algorithm,
but operates on I/O memory.

A microbenchmark measures the average number of CPU cycles
for sys_imageblit() after a stabilizing period of a few minutes
(i7-4790, FullHD, simpledrm, kernel with debugging). The value
for CFB is given as a reference.

  sys_imageblit(), new: 25934 cycles
  sys_imageblit(), old: 35944 cycles
  cfb_imageblit():      30566 cycles

In the optimized case, sys_imageblit() is now ~30% faster than before
and ~20% faster than cfb_imageblit().

v2:
	* move switch out of inner loop (Gerd)
	* remove test for alignment of dst1 (Sam)

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220223193804.18636-3-tzimmermann@suse.de
2022-03-02 20:20:46 +01:00
Thomas Zimmermann
7dbc515f5c fbdev: Improve performance of sys_fillrect()
Improve the performance of sys_fillrect() by using word-aligned
32/64-bit mov instructions. While the code tried to implement this,
the compiler failed to create fast instructions. The resulting
binary instructions were even slower than cfb_fillrect(), which
uses the same algorithm, but operates on I/O memory.

A microbenchmark measures the average number of CPU cycles
for sys_fillrect() after a stabilizing period of a few minutes
(i7-4790, FullHD, simpledrm, kernel with debugging). The value
for CFB is given as a reference.

  sys_fillrect(), new:  26586 cycles
  sys_fillrect(), old: 166603 cycles
  cfb_fillrect():       41012 cycles

In the optimized case, sys_fillrect() is now ~6x faster than before
and ~1.5x faster than the CFB implementation.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20220223193804.18636-2-tzimmermann@suse.de
2022-03-02 20:20:34 +01:00
Srinivasan Shanmugam
b2006061ae drm/i915/xehpsdv: Move render/compute engine reset domains related workarounds
Registers that exist in the shared render/compute reset domain need to
be placed on an engine workaround list to ensure that they are properly
re-applied whenever an RCS or CCS engine is reset.  We have a number of
workarounds (updating registers MLTICTXCTL, L3SQCREG1_CCS0,
GEN12_MERT_MOD_CTRL, and GEN12_GAMCNTRL_CTRL) that are incorrectly
implemented on the 'gt' workaround list and need to be moved
accordingly.

Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.s@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-14-matthew.d.roper@intel.com
2022-03-02 06:52:42 -08:00
Matt Roper
ff6b19d3a0 drm/i915/xehp: Add compute workarounds
Additional workarounds are required once we start exposing CCS engines.

Note that we have a number of workarounds that update registers in the
shared render/compute reset domain.  Historically we've just added such
registers to the RCS engine's workaround list.  But going forward we
should be more careful to place such workarounds on a wa_list for an
engine that definitely exists and is not fused off (e.g., a platform
with no RCS would never apply the RCS wa_list).  We'll keep
rcs_engine_wa_init() focused on RCS-specific workarounds that only need
to be applied if the RCS engine is present.  A separate
general_render_compute_wa_init() function will be used to define
workarounds that touch registers in the shared render/compute reset
domain and that we need to apply regardless of what render and/or
compute engines actually exist.  Any workarounds defined in this new
function will internally be added to the first present RCS or CCS
engine's workaround list to ensure they get applied (and only get
applied once rather than being needlessly re-applied several times).

Co-author: Srinivasan Shanmugam
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-13-matthew.d.roper@intel.com
2022-03-02 06:52:42 -08:00
Daniele Ceraolo Spurio
88ed07cb27 drm/i915/xehp: handle fused off CCS engines
HW resources are divided across the active CCS engines at the compute
slice level, with each CCS having priority on one of the cslices.
If a compute slice has no enabled DSS, its paired compute engine is not
usable in full parallel execution because the other ones already fully
saturate the HW, so consider it fused off.

v2 (José):
 - moved it to its own function
 - fixed definition of ccs_mask

v3 (Matt):
 - Replace fls() condition with a simple IP version test

v4 (Matt):
 - Don't try to calculate a ccs_mask using
   intel_slicemask_from_dssmask() until we've determined that we're
   running on an Xe_HP platform where the logic makes sense (and won't
   overflow).

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302052008.1884985-1-matthew.d.roper@intel.com
2022-03-02 06:52:42 -08:00
Matthew Brost
e393e2aa0a drm/i915/xehp: Don't support parallel submission on compute / render
A different emit breadcrumbs ring programming is required for compute /
render and we don't have UMD user so just reject parallel submission for
these engine classes.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-11-matthew.d.roper@intel.com
2022-03-02 06:52:42 -08:00
Daniele Ceraolo Spurio
ea4ca894a1 drm/i915/xehp/guc: enable compute engine inside GuC
Tell GuC that CCS is enabled by setting the CCS mask in its ADS.

Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Original-author: Michel Thierry
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-10-matthew.d.roper@intel.com
2022-03-02 06:52:09 -08:00
Matt Roper
87cb6d80f2 drm/i915/xehp: Enable ccs/dual-ctx in RCU_MODE
We have to specify in the Render Control Unit Mode register
when CCS is enabled.

v2:
 - Move RCU_MODE programming to a helper function.  (Tvrtko)
 - Clean up and clarify comments.  (Tvrtko)
 - Add RCU_MODE to the GuC save/restore list.  (Daniele)
v3:
 - Move this patch before the GuC ADS update to enable compute engines;
   the definition of RCU_MODE and its insertion into the save/restore
   list moves to this patch.  (Daniele)
v4:
 - Call xehp_enable_ccs_engines() directly in guc_resume() and
   execlists_resume() rather than adding an extra layer of wrapping to
   the engine->resume() vfunc.  (Umesh)

Bspec: 46034
Original-author: Michel Thierry
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302001554.1836066-1-matthew.d.roper@intel.com
2022-03-02 06:45:21 -08:00
Matt Roper
adfadb5638 drm/i915/xehp: Define context scheduling attributes in lrc descriptor
In Dual Context mode the EUs are shared between render and compute
command streamers. The hardware provides a field in the lrc descriptor
to indicate the prioritization of the thread dispatch associated to the
corresponding context.

The context priority is set to 'low' at creation time and relies on the
existing context priority to set it to low/normal/high.

Bspec: 46145, 46260
Original-author: Michel Thierry
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Prasad Nallani <prasad.nallani@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-8-matthew.d.roper@intel.com
2022-03-02 06:45:21 -08:00