A mirror of the official Linux kernel repository just in case
Go to file
Matthew Brost 6a07990384 drm/i915/guc: Add delay to disable scheduling after pin count goes to zero
Add a delay, configurable via debugfs (default 34ms), to disable
scheduling of a context after the pin count goes to zero. Disable
scheduling is a costly operation as it requires synchronizing with
the GuC. So the idea is that a delay allows the user to resubmit
something before doing this operation. This delay is only done if
the context isn't closed and less than a given threshold
(default is 3/4) of the guc_ids are in use.

As temporary WA disable this feature for the selftests. Selftests are
very timing sensitive and any change in timing can cause failure. A
follow up patch will fixup the selftests to understand this delay.

Alan Previn: Matt Brost first introduced this series back in Oct 2021.
However no real world workload with measured performance impact was
available to prove the intended results. Today, this series is being
republished in response to a real world workload that benefited greatly
from it along with measured performance improvement.

Workload description: 36 containers were created on a DG2 device where
each container was performing a combination of 720p 3d game rendering
and 30fps video encoding. The workload density was configured in a way
that guaranteed each container to ALWAYS be able to render and
encode no less than 30fps with a predefined maximum render + encode
latency time. That means the totality of all 36 containers and their
workloads were not saturating the engines to their max (in order to
maintain just enough headrooom to meet the min fps and max latencies
of incoming container submissions).

Problem statement: It was observed that the CPU core processing the i915
soft IRQ work was experiencing severe load. Using tracelogs and an
instrumentation patch to count specific i915 IRQ events, it was confirmed
that the majority of the CPU cycles were caused by the
gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
majority of the cycles was determined to be processing a specific G2H
IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
by GuC in response to i915 KMD sending H2G requests:
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
whenever a context goes idle so that we can unpin the context from GuC.
The high CPU utilization % symptom was limiting density scaling.

Root Cause Analysis: Because the incoming execution buffers were spread
across 36 different containers (each with multiple contexts) but the
system in totality was NOT saturated to the max, it was assumed that each
context was constantly idling between submissions. This was causing
a thrashing of unpinning contexts from GuC at one moment, followed quickly
by repinning them due to incoming workload the very next moment. These
event-pairs were being triggered across multiple contexts per container,
across all containers at the rate of > 30 times per sec per context.

Metrics: When running this workload without this patch, we measured an
average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
seconds or ~10 million times over ~25+ mins. With this patch, the count
reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
improvement observed is ~99% for the average counts per 10 seconds.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220817020511.2180747-3-alan.previn.teres.alexis@intel.com
2022-08-18 15:23:47 -07:00
arch Merge drm/drm-next into drm-intel-next 2022-04-26 16:44:31 +03:00
block for-5.18/block-2022-04-01 2022-04-01 16:20:00 -07:00
certs Kbuild updates for v5.18 2022-03-31 11:59:03 -07:00
crypto for-5.18/64bit-pi-2022-03-25 2022-03-26 12:01:35 -07:00
Documentation drm/doc/rfc: VM_BIND uapi definition 2022-07-01 16:47:07 -07:00
drivers drm/i915/guc: Add delay to disable scheduling after pin count goes to zero 2022-08-18 15:23:47 -07:00
fs Driver core changes for 5.18-rc2 2022-04-10 09:55:09 -10:00
include drm/i915/mtl: Add MeteorLake PCI IDs 2022-07-08 13:25:33 -07:00
init Kbuild updates for v5.18 2022-03-31 11:59:03 -07:00
ipc fs: allocate inode by using alloc_inode_sb() 2022-03-22 15:57:03 -07:00
kernel Merge drm/drm-next into drm-misc-next 2022-04-18 20:46:55 +01:00
lib Driver core changes for 5.18-rc2 2022-04-10 09:55:09 -10:00
LICENSES LICENSES/LGPL-2.1: Add LGPL-2.1-or-later as valid identifiers 2021-12-16 14:33:10 +01:00
mm - Allow the compiler to optimize away unused percpu accesses and change 2022-04-10 06:56:46 -10:00
net NFS client bugfixes for Linux 5.18 2022-04-08 07:39:17 -10:00
samples vfio/mdev: Remove mdev_parent_ops 2022-04-21 07:36:56 -04:00
scripts modpost: restore the warning message for missing symbol versions 2022-04-03 03:11:51 +09:00
security hardening updates for v5.18-rc1-fix1 2022-03-31 11:43:01 -07:00
sound sound fixes for 5.18-rc1 2022-04-01 10:32:46 -07:00
tools - Fix the MSI message data struct definition 2022-04-10 07:12:27 -10:00
usr Kbuild updates for v5.18 2022-03-31 11:59:03 -07:00
virt KVM: Remove dirty handling from gfn_to_pfn_cache completely 2022-04-02 05:34:41 -04:00
.clang-format genirq/msi: Make interrupt allocation less convoluted 2021-12-16 22:22:20 +01:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap mailmap: update Vasily Averin's email address 2022-04-08 14:20:36 -10:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: replace a Microchip AT91 maintainer 2022-02-09 11:30:01 +01:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS drm/i915/gsc: add gsc as a mei auxiliary device 2022-04-21 11:33:56 -07:00
Makefile Linux 5.18-rc2 2022-04-10 14:21:36 -10:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.