The field order selection in VDIC_C register uses different bits
depending on whether the VDIC is receiving from a CSI ("AUTO") or
from memory ("MAN"). Since the VDIC cannot receive from both CSI
and memory at the same time, set or clear both field order bits to
cover both cases.
Signed-off-by: Steve Longerbeam <steve_longerbeam@mentor.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
This is not used anymore since commit eb8c88808c ("drm/imx: add
deferred plane disabling"), remove it.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Most of the 64 IPUv3 DMA channels are never used, some of them (channels
16, 30, 32, 34-39, and 53-63) are even marked as reserved.
Allocate the channel control structure only when a channel is actually
requested, replace the fixed size array with a list, and remove the
unused enabled and busy fields from the ipuv3_channel structure.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Allow to skip writing odd chroma rows by setting the RDRW bit for
4:2:0 chroma subsampled formats for any IDMAC write channel. This
also allows to skip reading odd rows for the VDIC read channel.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The counter load enable bit has no effect when the shadow register
set is activated. As we always operate the PRG with shadow enabled
it is safe to remove this.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
during the emulation of virtual reset:
1. only reset the engine related mmio ending with MMIO
offset Master_IRQ, not include display stuff.
2. fences are not required to set default
value as well to prevent screen flicking.
this will fix the issue of Guest screen hang while running
Force tdr in Linux guest.
v2:
- only reset the engine related mmio. (Zhenyu & Zhiyuan)
v3:
- IMR/Ring mode registers are not save/restored. (Changbin)
v4:
- redefine the MMIO reset offset for easy understanding. (Zhenyu)
- pvinfo can be reset. (Zhenyu)
v5:
- add more comments for mmio reset. (Zhenyu)
Cc: Changbin Du <changbin.du@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Lv zhiyuan <zhiyuan.lv@intel.com>
Cc: Zhang Yulei <yulei.zhang@intel.com>
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Emulating the GDRST read behavior correctly to ack the
guest reset request.
v2:
- split the original patch into two:
GDRST read handler and virtual gpu reset. (Zhenyu)
v3:
- emulate the GDRST read right after write. (Zhenyu)
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhang Yulei <yulei.zhang@intel.com>
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
On Skylake platform, The traced virtual mmio registers are up to 2039.
So tuning the hash table size to improve lookup performance.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
We count all the tracked virtual MMIO registers, which can help us to
tune the MMIO hash table.
v2: Move num_tracked_mmio into gvt structure.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Function calls are expensive. I have see obvious overhead call to
these wrappers in perf data, especially from the cmd parser side.
So make these simple wrappers be inline to kill them all.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Type u8 is big enough to contain all MMIO attribute flags. As the
total MMIO size is 2MB so we saved 1.5MB memory.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The size, length, addr_mask fields actually are not necessary. Every
tracked mmio has DWORD size, and addr_mask is a legacy field.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Some of traced MMIO registers are a large continuous section. These
stuffed the MMIO lookup hash table and so waste lots of memory and
get much lower lookup performance.
Here we picked out these sections by special handling. These sections
include:
o Display pipe registers, total 768.
o The PVINFO page, total 1024.
o MCHBAR_MIRROR, total 65536.
o CSR_MMIO, total 3072.
So we removed 70,400 items from the hash table, and speed up guest
boot time by ~500ms.
v2:
o add a local function find_mmio_block().
o fix comments.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
In some cases, GVT-g is accessing MMIO without holding runtime_pm
and this patch can add the inline API for doing the runtime_pm get/put
to make sure when accessing HW MMIO the i915 HW is really powered on.
Suggested-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This flag is already set in the top level Makefile of the kernel.
Also, by having set CONFIG_DRM_I915_GVT, thereby appending -Wall to
ccflags, you undo all the -Wno-* cflags previously set in the Make
variable KBUILD_CFLAGS.
For example:
cc foo.c -Wall -Wno-format -Wall
resets -Wformat.
Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
remove all the legacy pre-BDW mmio handlers and the corresponding
usage/definition since pre-BDW platforms are not supported in GVT
environment.
v2:
- clean up all the left dirty code before BDW, e.g
all D_HSW usage and itself, D_IVB, D_PRE_BDW. (Zhenyu)
v3:
- change is based on gvt-staging. (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The time based scheduler poll context busy status at every
micro-second during vGPU switch, it will make GPU idle for a while
when the context is very small and completed before the next
micro-second arrival. Trigger scheduling immediately after context
complete will eliminate GPU idle and improve performance.
Create two vGPU with same type, run Heaven simultaneously:
Before this patch:
+---------+----------+----------+
| | vGPU1 | vGPU2 |
+---------+----------+----------+
| Heaven | 357 | 354 |
+-------------------------------+
After this patch:
+---------+----------+----------+
| | vGPU1 | vGPU2 |
+---------+----------+----------+
| Heaven | 397 | 398 |
+-------------------------------+
v2: Let need_reschedule protect by gvt-lock.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This patch decouple the time slice calculation and scheduler, let
other event be able to trigger scheduling without impact the
calculation for QoS.
v2: add only one new enum definition.
v3: fix typo.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Since cmd message have been recorded in trace, gvt_dbg_cmd isn't
necessary. This will reduce much of dmesg as gvt_dbg_cmd is repeated
on each workload.
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Currently gvt dmesg is so heavy at drm.debug=0x2 that guest and
host almost couldn't run on xengt.
This patch transfer these repeated messages into trace, so dmesg
is light at drm.debug=0x2, and user could get the target message through
trace event and trace filter.
Suggested-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
kernel hangcheck needs to check RING_INSTDONE and SC_INSTDONE registers'
state to know if hardware is still running. In GVT-g environment, we need
to emulate these registers changing for all the guests although they are
not render owner. Here we return the physical state for all the guests,
then if INSTDONE is changing guest can know hardware is still running
although its workload is pending.
Read INSTDONE isn't one correct way to know if guest trigger gfx reset,
especially with Linux guest, it will read ACTH first, then check INSTDONE
and SUBSLICE registers to check if hardware is still running, at last
trigger gfx reset when it finds all the registers is frozen. In Windows
guest, read INSTDONE usually happens when OS detect TDR.
With the difference between Windows and Linux guest, "disable_warn_untrack"
may let debug log run into wrong state(Linux guest trigger hangcheck
with no ACTHD changed, then check INSTDONE), but actually there is no TDR
happened.
The new policy is always WARN with untrack MMIO r/w. Bad effect is many
noisy untrack mmio warning logs exist when real TDR happen. Even so you can
control the log output or not by setting the debug mask bit.
v2: remove log in instdone_mmio_read
Suggested-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Commit ab9da627906a ("drm/i915: make context status notifier head be
per engine") gives us a chance to inspect every single request. Then
we can eliminate unnecessary mmio switching for same vGPU. We only
need mmio switching for different VMs (including host).
This patch introduced a new general API intel_gvt_switch_mmio() to
replace the old intel_gvt_load/restore_render_mmio(). This function
can be further optimized for vGPU to vGPU switching.
To support individual ring switch, we track the owner who occupy
each ring. When another VM or host request a ring we do the mmio
context switching. Otherwise no need to switch the ring.
This optimization is very useful if only one guest has plenty of
workloads and the host is mostly idle. The best case is no mmio
switching will happen.
v2:
o fix missing ring switch issue. (chuanxiao)
o support individual ring switch.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The function intel_vgpu_submit_execlist could be more simpler. It
actually does:
1) validate the submission. The first context must be valid,
and all two must be privilege_access.
2) submit valid contexts. The first one need emulate schedule_in.
We do not need a bitmap, valid desc copy valid_desc. Local variable
emulate_schedule_in also can be optimized out.
v2: dump desc content in err msg (Zhi Wang)
Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The gvt:gvt_command trace involve unnecessary overhead even this trace is
not enabled. We need improve it.
The kernel trace infrastructure provide a full api to define a trace event.
We should leverage them if possible. And one important thing is that a trace
point should store raw data but not format string.
This patch include two part work:
1) Refactor the gvt_command trace definition, including:
o only store raw trace data.
o use __dynamic_array() to declare a variable size buffer.
o use __print_array() to format raw cmd data.
o rename vm_id as vgpu_id.
2) Improve the trace invoking, including:
o remove the cycles calculation for handler. We can get this data
by any perf tool.
o do not make a backup for raw cmd data which just doesn't make sense.
With this patch, this trace has no overhead if it is not enabled. And we are
trace style now.
The final output example:
gvt workload 0-211 [000] ...1 120.555964: gvt_command: vgpu1 ring 0: buf_type 0, ip_gma e161e880, raw cmd {0x4000000}
gvt workload 0-211 [000] ...1 120.556014: gvt_command: vgpu1 ring 0: buf_type 0, ip_gma e161e884, raw cmd {0x7a000004,0x1004000,0xe1511018,0x0,0x7d,0x0}
gvt workload 0-211 [000] ...1 120.556062: gvt_command: vgpu1 ring 0: buf_type 0, ip_gma e161e89c, raw cmd {0x7a000004,0x140000,0x0,0x0,0x0,0x0}
gvt workload 0-211 [000] ...1 120.556110: gvt_command: vgpu1 ring 0: buf_type 0, ip_gma e161e8b4, raw cmd {0x10400002,0xe1511018,0x0,0x7d}
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This was missed when Andres' queue patches were rebased.
Fixes: 42794b27 (drm/amdgpu: take ownership of per-pipe configuration v3)
Reviewed-by: Alex Xie <AlexBin.Xie@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes hangs on single MEC asics.
Fixes: 2ed286fb434 (drm/amdgpu: new queue policy, take first 2 queues of each pipe v2)
Reviewed-by: Alex Xie <AlexBin.Xie@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CNL power wells are very similar to SKL, with the exception that the
misc IO well has been split into separate AUX IO wells.
Not sure if DMC is supposed to manage the AUX wells for us or not.
Let's assume so for now.
v2: DDI A power well wants DDI A domains, not DDI B domains
v3: s/BIT/BIT_ULL and add proper Aux IO domains. (Rodrigo)
v4: Remove PW_DDI_E. Not supported on Current CNL SKUs. (Rodrigo).
v5: Removed DDI_E_IO_DOMAINS and moved PORT_DDI_E_IO to DDI_A_IO
for the same reasons as v4 when we found out that current CNL
SKUs don't have the full port E split.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496781040-20888-10-git-send-email-rodrigo.vivi@intel.com
Cannonlake is a Intel® Processor containing Intel® HD Graphics
following Kabylake.
It is Gen10.
Let's start by adding the platform definition based on previous
platforms but yet as alpha_support.
On following patches we will start adding PCI IDs and the
platform specific changes.
CNL has an increased DDB size as Damien had previously
noticed and provided a separated patch that got squashed here.
v2: Squash DDB size here per Ander request.
Credits-to: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496781040-20888-1-git-send-email-rodrigo.vivi@intel.com