For perfomance purpose, scanning of non-privileged batch buffer is turned
off by default. But for debugging purpose, it can be turned on via debugfs.
After scanning, we submit the original non-privileged batch buffer into
hardware, so that the scanning is only a peeking window of guest submitted
commands and will not affect the execution results.
v4:
- refine debugfs print format&content (zhenyu wang)
- print engine id instread of engine name to prevent potential memory leak
in debugfs warning message. (zhenyu wang)
v3:
- change vgpu->scan_nonprivbb from type bool to u32, so it is able to
selectively turn on/off scanning of non-privileged batch buffer on engine
level. e.g.
if vgpu->scan_nonprivbb=3, then it will scan non-privileged batch buffer
on engine 0 and 1.
- in debugfs interface to set vgpu->scan_nonprivbb, print warning message
to warn user and explicitly tell state change in kernel log (zhenyu wang)
v2:
- rebase
- update comments for start_gma_offset (henry)
Signed-off-by: Zhao Yan <yan.y.zhao@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJauCZfAAoJEHm+PkMAQRiGWGUH/2rhdQDkoJpYWnjaQkolECG8
MxpGE7nmIIHxQcbSDdHTGJ8IhVm6Z5wZ7ym/PwCDTT043Y1y341sJrIwL2/nTG6d
HVidk8hFvgN6QzlzVAHT3ZZMII/V9Zt+VV5SUYLGnPAVuJNHo/6uzWlTU5g+NTFo
IquFDdQUaGBlkKqby+NoAFnkV1UAIkW0g22cfvPnlO5GMer0gusGyVNvVp7TNj3C
sqj4Hvt3RMDLMNe9RZ2pFTiOD096n8FWpYftZneUTxFImhRV3Jg5MaaYZm9SI3HW
tXrv/LChT/F1mi5Pkx6tkT5Hr8WvcrwDMJ4It1kom10RqWAgjxIR3CMm448ileY=
=YKUG
-----END PGP SIGNATURE-----
Backmerge tag 'v4.16-rc7' into drm-next
Linux 4.16-rc7
This was requested by Daniel, and things were getting
a bit hard to reconcile, most of the conflicts were
trivial though.
Once the ring buffer is copied to ring_scan_buffer and scanned,
the shadow batch buffer start address is only updated into
ring_scan_buffer, not the real ring address allocated through
intel_ring_begin in later copy_workload_to_ring_buffer.
This patch is only to set the right shadow batch buffer address
from Ring buffer, not include the shadow_wa_ctx.
v2:
- refine some comments. (Zhenyu)
v3:
- fix typo in title. (Zhenyu)
v4:
- remove the unnecessary comments. (Zhenyu)
- add comments in bb_start_cmd_va update. (Zhenyu)
Fixes: 0a53bc07f0 ("drm/i915/gvt: Separate cmd scan from request allocation")
Cc: stable@vger.kernel.org # v4.15
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Yulei Zhang <yulei.zhang@intel.com>
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
When populating shadow ctx from guest, we should handle oa related
registers in hw ctx, so that they will not be overlapped by guest oa
configs. This patch made it possible to capture oa data from host for
both host and guests.
Signed-off-by: Min He <min.he@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
To pull in the HDCP changes, especially wait_for changes to drm/i915
that Chris wants to build on top of.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We want to de-emphasize the link between the request (dependency,
execution and fence tracking) from GEM and so rename the struct from
drm_i915_gem_request to i915_request. That is we may implement the GEM
user interface on top of requests, but they are an abstraction for
tracking execution rather than an implementation detail of GEM. (Since
they are not tied to HW, we keep the i915 prefix as opposed to intel.)
In short, the spatch:
@@
@@
- struct drm_i915_gem_request
+ struct i915_request
A corollary to contracting the type name, we also harmonise on using
'rq' shorthand for local variables where space if of the essence and
repetition makes 'request' unwieldy. For globals and struct members,
'request' is still much preferred for its clarity.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Using per engine ops will be more flexible, here refine sub-ops(init,
clean) as per engine operation align with reset operation. This change also
will be used in next fix patch for VM engine reset.
Cc: Fred Gao <fred.gao@intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Convert the macro to a function which should always be preferred.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
1) Use standard i915 GEM object sequence to access the shadow batch buffer.
2) Manage i915 vma life cycle to solve one FIXME.
v2:
- Refine code structure.
- Refine the usage of GEM APIs.
- Add the missing lock/unlock in release_shadow_batch_buffer.
Test on my SKL NuC.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Introduce vGPU submission ops to support easy switching submission mode
of one vGPU between different OSes.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Move common vGPU workload creation functions into scheduler.c since
they are not specific to execlist emulation.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
To move workload related functions into scheduler.c, an expected way is
to collect all the init/clean functions related to vGPU workload
submission into fewer functions.
Rename intel_vgpu_{init, clean}_gvt_context() for above usage in future.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZ9kEFAAoJEHm+PkMAQRiGw6wH/0j197qyGd0hkVFMJO6LAgN3
KQWS4nZ5BkVDocwv0RVnUJTtXqU1eozFgdVEtSoaFXpzlHGuptR2Tau9efDCJ7w3
/utZxqvhGebZd2T+j+/o/LE8BRQxhADBNJq2D/o0WNt8ecxuG0GIkhkEYt/o3z1v
/sxlwVwzXB7Dc/h1WcgGJG7cS6L9KzzAzGAS/iNvdFrPOygHBv8c0MxVZIiBIeeK
1nZdyvbyM8uenSyG+prGt9ENrqXZxxfwUxIchi2V7A9m1WmD5zijNkf1JCWji/O+
UsA1auxna7MwoxjxqZuGm4MlKOwZ+8xutk4JGgc+aP/ulndJbJYu+4op/3vaFBM=
=Mhx+
-----END PGP SIGNATURE-----
Backmerge tag 'v4.14-rc7' into drm-next
Linux 4.14-rc7
Requested by Ben Skeggs for nouveau to avoid major conflicts,
and things were getting a bit conflicty already, esp around amdgpu
reverts.
Need to check valid state for per_ctx bb and bypass batch buffer
combine for scan if necessary. Otherwise adding invalid MI batch
buffer start cmd for per_ctx bb will cause scan failure, which is
taken as -EFAULT now so vGPU would be put in failsafe. This trys
to fix that by checking per_ctx bb valid state. Also remove old
invalid WARNING that indirect ctx bb shouldn't depend on valid
per_ctx bb.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
When an error occurs after shadow_indirect_ctx, this patch is to do the
proper cleanup and rollback to the original states for shadowed indirect
context before the workload is abandoned.
v2:
- split the mixed several error paths for better review. (Zhenyu)
v3:
- no return check for clean up functions. (Changbin)
v4:
- expose and reuse the existing release_shadow_wa_ctx. (Zhenyu)
v5:
- move the release function to scheduler.c file. (Zhenyu)
v6:
- move error handling code of intel_gvt_scan_and_shadow_workload
to here. (Zhenyu)
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Let the workload audit and shadow ahead of vGPU scheduling, that
will eliminate GPU idle time and improve performance for multi-VM.
The performance of Heaven running simultaneously in 3VMs has
improved 20% after this patch.
v2:Remove condition current->vgpu==vgpu when shadow during ELSP
writing.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Commit ab9da627906a ("drm/i915: make context status notifier head be
per engine") gives us a chance to inspect every single request. Then
we can eliminate unnecessary mmio switching for same vGPU. We only
need mmio switching for different VMs (including host).
This patch introduced a new general API intel_gvt_switch_mmio() to
replace the old intel_gvt_load/restore_render_mmio(). This function
can be further optimized for vGPU to vGPU switching.
To support individual ring switch, we track the owner who occupy
each ring. When another VM or host request a ring we do the mmio
context switching. Otherwise no need to switch the ring.
This optimization is very useful if only one guest has plenty of
workloads and the host is mostly idle. The best case is no mmio
switching will happen.
v2:
o fix missing ring switch issue. (chuanxiao)
o support individual ring switch.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
intel_shadow_wa_ctx is a field of intel_vgpu_workload. container_of() can
be used to refine the relation-ship between intel_shadow_wa_ctx and
intel_vgpu_workload. This patch removes the useless dereference.
v2. add "drm/i915/gvt" prefix. (Zhenyu)
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
set_gma_to_bb_cmd() is completely bogus - it is (incorrectly) applying
the rules to read a GTT offset from a command as opposed to writing the
GTT offset. And to cap it all set_gma_to_bb_cmd() is called within a list
iterator of the most strange construction.
Fixes: be1da7070a ("drm/i915/gvt: vGPU command scanner")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Yulei Zhang <yulei.zhang@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Tested-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This patch introduces a command scanner to scan guest command buffers.
Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This patch introduces a vGPU schedule policy framework, with a timer based
schedule policy module for now
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This patch introduces the vGPU workload scheduler routines.
GVT workload scheduler is responsible for picking and executing GVT workload
from current scheduled vGPU. Before the workload is submitted to host i915,
the guest execlist context will be shadowed in the host GVT shadow context.
the instructions in guest ring buffer will be copied into GVT shadow ring
buffer. Then GVT-g workload scheduler will scan the instructions in guest
ring buffer and submit it to host i915.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This patch introduces the vGPU workload submission logics.
Under virtualization environment, guest will submit workload through
virtual execlist submit port. The submitted workload load will be wrapped
into an gvt workload which will be picked by GVT workload scheduler and
executed on host i915 later.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>